Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Open Provenance Model for Scientific Workflows Professor Luc Moreau University of Southampton

Similar presentations

Presentation on theme: "An Open Provenance Model for Scientific Workflows Professor Luc Moreau University of Southampton"— Presentation transcript:

1 An Open Provenance Model for Scientific Workflows Professor Luc Moreau University of Southampton

2 Provenance & PASOA Teams University of Southampton Luc Moreau, Paul Groth, Simon Miles, Victor Tan, Miguel Branco, Sofia Tsasakou, Sheng Jiang, Steve Munroe, Zheng Chen IBM UK (EU Project Coordinator) John Ibbotson, Neil Hardman, Alexis Biller University of Wales, Cardiff Omer Rana, Arnaud Contes, Vikas Deora, Ian Wootten, Shrija Rajbhandari Universitad Politecnica de Catalunya (UPC) Steven Willmott, Javier Vazquez SZTAKI Laszlo Varga, Arpad Andics, Tamas Kifor German Aerospace Andreas Schreiber, Guy Kloss, Frank Danneman

3 Contents Motivation Provenance Concept Map Process documentation in a concrete bioinformatics application Conclusions

4 Motivation

5 Peer Review/Audit Accounting Banking Healthcare Academic publishing

6 e-Science datasets How to undertake peer-reviewing and validation of e-Scientific results?

7 Current Solutions Proprietary, Monolithic Silos, Closed Do not inter-operate with other applications Not adaptable to new regulations

8 Provenance Oxford English Dictionary: the fact of coming from some particular source or quarter; origin, derivation the history or pedigree of a work of art, manuscript, rare book, etc.; concretely, a record of the passage of an item through its various owners. Concept vs representation

9 Application Drivers Aerospace engineering: maintain a historical record of design processes, up to 99 years. Organ transplant management: tracking of previous decisions, crucial to maximise the efficiency in matching and recovery rate of patients High Energy Physics: tracking, analysing, verifying data sets in the ATLAS Experiment of the Large Hadron Collider (CERN) Bioinformatics: verification and auditing of “experiments” (e.g. for drug approval)

10 Provenance Concept Map

11 is an execution of Application Services Provenance (concept) Data product produces Process Documentation P-structure has a structure operates over P-assertions consists of contains assert Process documents is defined as a past Provenance (representation) is represented by Provenance Query is obtained by has

12 Making Applications Provenance Aware Application Data Product Provenance Store Assert p-assertions and record them as Process Documentation Obtain the provenance of data by issuing provenance queries

13 Process Documentation M1 M2 M3 M4 f1 f2 M3 = f1(M1) M2 = f2(M1,M4) M2 is in reply to M1 I received M1, M4 I sent M2, M3 Interaction p-assertions Relationship p-assertions Service state p-assertions I received M1 at time t I used algorithm x.y.z

14 Data flow Interaction p-assertions allow us to specify a flow of data between services Relationship p-assertions allow us to characterise the flow of data “inside” an service Overall data flow (internal + external) constitutes a DAG, which characterises the process that led to a result

15 Process Documentation in a Concrete Bioinformatics Application

16 Biology Determine how protein sequences fold into a 3D structure? Structure of protein sequences may help to answer this question. Structure can be quantified by textual compressibility. Determine the amino acid groupings that maximize compressibility?

17 Collaboration Diagram

18 Actual Call DAG

19 The P-Structure The logical structure of a provenance store

20 Interaction Record The set of p-assertions pertaining to a given interaction (i.e., message exchange between a sender and a receiver)

21 Interaction Key A unique identifier for an interaction Sender identity Receiver identity Local id

22 View The set of p-assertions created by an asserter involved in an interaction (sender or receiver view)

23 Asserter The identity of an asserter

24 Interaction P-Assertion An assertion of the contents of a message by an actor that has sent or received that message

25 Interaction P-Assertion Content The content of an interaction p-assertion: here, the invocation of blast (through a wrapper)

26 Interaction Content Provenance-related information passed in application messages

27 Actor State P-Assertion An assertion made by an actor about its internal state in the context of a specific interaction

28 Relationship P-Assertion With respect to an interaction, a relationship p-assertion is an assertion, made by an actor, that describes how the actor obtained output data or the whole message sent in that interaction by applying some function to input data or messages from other interactions.

29 Subject Id The identity of the subject of a relationship

30 Object Id The identity of the object of a relationship

31 Process Documentation Characteristics Common logical structure of the provenance store shared by all asserting and querying actors Can be produced autonomously, asynchronously by the different application components Open, extensible model, for which we are producing a public specification Tools can operate on it (e.g. visualisation, reasoning)

32 Performance (HPDC’05)

33 Standardisation Philosophy Thin layer common between systems: extensible data model Model can be extended for specific: technologies (WS, Web, …), or application domains (Bio, Healthcare, Desktop, …) Service interfaces

34 WS-Prov-Intro WS-Prov-DM WS-Prov-Glo WS-Prov-RecWS-Prov-Query WS-Prov-DM-Link WS-Prov-DM-Infer WS-Prov-DM-DS Generic ProfilesDomain Specific Profiles WS-Prov-SOAP Technology Bindings WS-Prov-DM-Sec WS-Prov-WWW WS-Prov-DM-Rel WS-Prov-Primer Proposed List of Specifications

35 Conclusions

36 Provenance Store Record To Sum Up Query Compliance check Rerun/Reproduce Analyse Standardising the documentation of Business Processes Provenance Architecture Methodology Apply Healthcare Distribution Finance Aerospace Automobile Pharmaceutical Slide from John Ibbotson

37 Conclusions Crucial topic for many applications Full architectural specification Implementation available for download Methodology to make application provenance-aware Draft standardisation proposal to be released

38 Provenance Challenge Provenance Challenge Workshop at OGF18, Washington, September 11-14

39 Questions

Download ppt "An Open Provenance Model for Scientific Workflows Professor Luc Moreau University of Southampton"

Similar presentations

Ads by Google