Presentation is loading. Please wait.

Presentation is loading. Please wait.

Architecture Tutorial Provenance: overview Professor Luc Moreau University of Southampton

Similar presentations


Presentation on theme: "Architecture Tutorial Provenance: overview Professor Luc Moreau University of Southampton"— Presentation transcript:

1 Architecture Tutorial Provenance: overview Professor Luc Moreau L.Moreau@ecs.soton.ac.uk University of Southampton www.ecs.soton.ac.uk/~lavm

2 Architecture Tutorial Provenance & PASOA Teams University of Southampton –Luc Moreau, Paul Groth, Simon Miles, Victor Tan, Miguel Branco, Sofia Tsasakou, Sheng Jiang, Steve Munroe, Zheng Chen IBM UK (EU Project Coordinator) –John Ibbotson, Neil Hardman, Alexis Biller University of Wales, Cardiff –Omer Rana, Arnaud Contes, Vikas Deora, Ian Wootten, Shrija Rajbhandari Universitad Politecnica de Catalunya (UPC) –Steven Willmott, Javier Vazquez SZTAKI –Laszlo Varga, Arpad Andics, Tamas Kifor German Aerospace –Andreas Schreiber, Guy Kloss, Frank Danneman

3 Architecture Tutorial Contents Motivation Provenance Concepts Provenance Architecture Standardisation Conclusions

4 Architecture Tutorial Motivation

5 Architecture Tutorial Scientific Research Academic Peer Review

6 Architecture Tutorial Business Regulations Audit (Sarbanes-Oxley) Audit (Basel II) Accounting Banking

7 Architecture Tutorial Health Care Management European Recommendation R(97)5: on the protection of medical data

8 Architecture Tutorial e-Science datasets How to undertake peer-reviewing and validation of e-Scientific results?

9 Architecture Tutorial Compliance to Regulations The “next-compliance” problem –Can we be certain that by ensuring compliance to a new regulation, we do not break previous compliance?

10 Architecture Tutorial Current Solutions Proprietary, Monolithic Silos, Closed Do not inter-operate with other applications Not adaptable to new regulations

11 Architecture Tutorial Provenance Oxford English Dictionary: –the fact of coming from some particular source or quarter; origin, derivation –the history or pedigree of a work of art, manuscript, rare book, etc.; –concretely, a record of the passage of an item through its various owners. Concept vs representation

12 Architecture Tutorial Provenance in Computer Systems Our definition of provenance in the context of applications for which process matters to end users: The provenance of a piece of data is the process that led to that piece of data Our aim is to conceive a computer-based representation of provenance that allows us to perform useful analysis and reasoning to support our use cases

13 Architecture Tutorial Our Approach Define core concepts pertaining to provenance Specify functionality required to become “provenance-aware” Define open data models and protocols that allow systems to inter-operate Standardise data models and protocols Provide a reference implementation Provide reasoning capability

14 Architecture Tutorial Context (1) Aerospace engineering: maintain a historical record of design processes, up to 99 years. Organ transplant management: tracking of previous decisions, crucial to maximise the efficiency in matching and recovery rate of patients

15 Architecture Tutorial Context (2) High Energy Physics: tracking, analysing, verifying data sets in the ATLAS Experiment of the Large Hadron Collider (CERN) Bioinformatics: verification and auditing of “experiments” (e.g. for drug approval)

16 Architecture Tutorial Provenance Concepts

17 Architecture Tutorial Provenance “Lifecycle” Application Data Results Provenance Store Record Documentation of Execution Query and Reason over Provenance of Data Administer Store and its contents Core Interfaces to Provenance Store

18 Architecture Tutorial Nature of Documentation We represent the provenance of some data by documenting the process that led to the data: –documentation can be complete or partial; –it can be accurate or inaccurate; –it can present conflicting or consensual views of the actors involved; –it can provide operational details of execution or it can be abstract.

19 Architecture Tutorial p-assertion A given element of process documentation will be referred to as a p-assertion –p-assertion: is an assertion that is made by an actor and pertains to a process.

20 Architecture Tutorial Service Oriented Architecture Broad definition of service as component that takes some inputs and produces some outputs. Services are brought together to solve a given problem typically via a workflow definition that specifies their composition. Interactions with services take place with messages that are constructed according to services interface specification. The term actor denotes either a client or a service in a SOA. A process is defined as execution of a workflow

21 Architecture Tutorial M1 M2 M3 M4 Actor 1 Actor 2 I received M1, M4 I sent M2, M3 I received M3 I sent M4 From these p-assertions, we can derive that M3 was sent by Actor 1 and received by Actor 2 (and likewise for M4) If actors are black boxes, these assertions are not very useful because we do not know dependencies between messages Process Documentation (1)

22 Architecture Tutorial M1 M2 M3 M4 Actor 1 Actor 2 M2 is in reply to M1 M3 is caused by M1 M2 is caused by M4 M4 is in reply to M3 These assertions help identify order of messages, but not how data was computed Process Documentation (2)

23 Architecture Tutorial f M1 M2 M3 M4 Actor 1 Actor 2 f1 f2 M3 = f1(M1) M2 = f2(M1,M4) M4 = f(M3) These assertions help identify how data is computed, but provide no information about non-functional characteristics of the computation (time, resources used, etc) Process Documentation (3)

24 Architecture Tutorial M1 M2 M3 M4 Actor 1 Actor 2 I used 386 cluster Request sat in queue for 6min I used sparc processor I used algorithm x version x.y.z Process Documentation (4)

25 Architecture Tutorial Types of p-assertions (1) –Interaction p-assertion: is an assertion of the contents of a message by an actor that has sent or received that message I received M1, M4 I sent M2, M3

26 Architecture Tutorial Types of p-assertions (2) –Relationship p-assertion: is an assertion, made by an actor, that describes how the actor obtained an output message sent in an interaction by applying some function to input messages from other interactions (likewise for data) M2 is in reply to M1 M3 is caused by M1 M2 is caused by M4 M3 = f1(M1) M2 = f2(M1,M4)

27 Architecture Tutorial Types of p-assertions (3) –Actor state p-assertion: assertion made by an actor about its internal state in the context of a specific interaction I used sparc processor I used algorithm x version x.y.z

28 Architecture Tutorial Data flow Interaction p-assertions allow us to specify a flow of data between actors Relationship p-assertions allow us to characterise the flow of data “inside” an actor Overall data flow (internal + external) constitutes a DAG, which characterises the process that led to a result

29 Architecture Tutorial Provenance Architecture

30 Architecture Tutorial Interfaces to Provenance Store Application Results Provenance Store Record Documentation of Execution Query and Reason over Provenance of Data Administer Store and its contents

31 Architecture Tutorial

32 P-Assertion schemas

33 Architecture Tutorial The p-structure The p-structure is a common logical structure of the provenance store shared by all asserting and querying actors Hierarchical Indexed by interactions (interaction= 1 message exchange)

34 Architecture Tutorial Recording Protocol (Groth04-06) Abstract machines DS Properties –Termination –Liveness –Safety –Statelessness Documentation Properties –Immutability –Attribution –Datatype safety Foundation for adding necessary cryptographic techniques

35 Architecture Tutorial Querying Functionality (Miles06) Process Documentation Query Interface: allows for “navigation” of the documentation of execution –Allows us to view the provenance store (i.e. the p- structure) as if containing XML data structures –Independent of technology used for running application and internal store representation –Seamless navigation of application dependent and application independent process documentation

36 Architecture Tutorial Querying Functionality (Miles06) Provenance Query Interface: allows us to obtain the provenance of some specific data A recognition that there is not “one” provenance for a piece of data, but there may be different, depending on the end-user’s interest Hence, provenance is seen as the result of a query: –Identify a piece of data at a specific execution point –Scope of the process of interest: Filter in/out p-assertions according to actors, process, types of relationships, etc

37 Architecture Tutorial Standardisation

38 Architecture Tutorial Standardisation Options APIs Programmatic inter-op Recording and querying Interfaces Service inter-op Provenance Model Data inter-op

39 Architecture Tutorial Purpose of Standardisation Application Provenance Stores Record Documentation of Execution Application Allow for multiple applications to document their execution. Applications may be running in different institutions.

40 Architecture Tutorial Purpose of Standardisation Application Provenance Store Record Documentation of Execution Allow for multiple stores from multiple IT providers Provenance Store Provenance Store

41 Architecture Tutorial Purpose of Standardisation Provenance Store Query Provenance of Data Allow for multiple stores from multiple IT providers Provenance Store

42 Architecture Tutorial Purpose of Standardisation Allow for legacy, monolithic applications to expose their contents (according to standard schema) Convert in standard data format

43 Architecture Tutorial Purpose of Standardisation Allow third parties to host provenance stores, which are trusted by application owners but also auditors Application Provenance Store

44 Architecture Tutorial Compliance Oriented Architectures Separate execution documentation from compliance verification Allows for multiple compliance verifications Allows for validation to take place across multiple applications, possibly run by different institutions (in particular, allows for outsourcing and subcontracting). Approach is suitable for e- scientific peer-reviewing and business compliance verification

45 Architecture Tutorial Organ Transplant Scenario Hospital Electronic Healthcare Management Service Testing Lab

46 Architecture Tutorial Hospital Actors User Interface Donor Data Collector Brain Death Manager

47 Architecture Tutorial What’s on the CD PReServ (Paul Groth & Simon Miles) Offer recording and querying interfaces Available from www.pasoa.org www.pasoa.org Soon ogsa-dai based version available from www.gridprovenance.org Is being used in a bioinformatics application (cf. hpdc’05, iswc’05)

48 Architecture Tutorial Conclusions

49 Architecture Tutorial Standardising the documentation of Business Processes Provenance Store Record To Sum Up Query Compliance check Rerun/Reproduce Analyse Provenance –Architecture –Methodology Apply Healthcare Distribution FinanceAerospace Automobile Pharmaceutical Slide from John Ibbotson

50 Architecture Tutorial Overview of Today’s Talks Provenance Data Structures Recording and Querying Provenance –Break (30 minutes) Distribution and Scalability Security Methodology

51 Architecture Tutorial Questions


Download ppt "Architecture Tutorial Provenance: overview Professor Luc Moreau University of Southampton"

Similar presentations


Ads by Google