Presentation on theme: "PrIMe PrIMe : Provenance Incorporating Methodology Steve Munroe The EU Grid Provenance Project University of Southampton UK"— Presentation transcript:
PrIMe PrIMe : Provenance Incorporating Methodology Steve Munroe (firstname.lastname@example.org) The EU Grid Provenance Project University of Southampton UK www.gridprovenance.org
PrIMe 2 EU Grid Provenance Consortium University of Southampton –Luc Moreau, Steve Munroe, Sheng Jiang, Paul Groth, Simon Miles IBM UK (Project Coordinator) –John Ibbotson, Neil Hardman, Alexis Biller University of Wales, Cardiff –Omer Rana, Arnaud Contes, Vikas Deora Universitad Politecnica de Catalunya (UPC) –Steven Willmott, Javier Vazquez SZTAKI –Laszlo Varga, Arpad Andics German Aerospace –Andreas Schreiber, Guy Kloss, Frank Danneman
PrIMe 3 Overview of Talk Introducing Provenance Introducing PrIMe Stepping through PrIMe –Step 1. Provenance use cases –Step 2. Information items –Step 3. Identifying actors –Step 4. Actor interactions –Step 5. Knowledgeable actors –Step 6. Adaptations Evaluation Summary Conclusions
PrIMe 4 Provenance: dictionary definition Oxford English Dictionary: –the fact of coming from some particular source or quarter; origin, derivation –the history or pedigree of a work of art, manuscript, rare book, etc.; concretely, a record of the ultimate derivation and passage of an item through its various owners.
PrIMe 5 Provenance Definition Our definition of provenance in the context of applications for which process matters to end users: The provenance of a piece of data is the process that led to that piece of data Our aim is to conceive a computer-based representation of provenance that allows us to perform useful analysis and reasoning to support our use cases We use the notion of Process Documentation, which is composed of p-assertions
PrIMe 6 Provenance Applications Aerospace engineering: maintain a historical record of design processes, up to 99 years. Organ transplant management: tracking of previous decisions, crucial to maximise the efficiency in matching and recovery rate of patients
PrIMe 7 Provenance (2) High Energy Physics: tracking, analysing, verifying data sets in the ATLAS Experiment of the Large Hadron Collider (CERN) Bioinformatics: verification and auditing of “experiments” (e.g. for drug approval)
PrIMe 8 Types of p-assertions (1) –Interaction p-assertion: is an assertion of the contents of a message by an actor that has sent or received that message I received M1, M4 I sent M2, M3
PrIMe 9 Types of p-assertions (2) –Relationship p-assertion: is an assertion, made by an actor, that describes how the actor obtained output data or the whole message sent in an interaction by applying some function to input data or messages from other interactions. M2 is in reply to M1 M3 is caused by M1 M2 is caused by M4 M3 = f1(M1) M2 = f2(M1,M4)
PrIMe 10 Types of p-assertions (3) –Actor state p-assertion: assertion made by an actor about its internal state in the context of a specific interaction I used sparc processor I used algorithm x version x.y.z
PrIMe 11 The Provenance Middleware Architecture
PrIMe 12 Introducing PrIMe A software engineering methodology for making applications provenance-aware
PrIMe 13 Introducing PrIMe: Key aims Provide software engineering guidelines for: identifying and expressing provenance use cases identifying the kinds of information items that are required to satisfy use cases identifying actors and the interactions between them in order to effect the recording of process documentation identifying the set of adaptations that integrate the provenance architecture with the application to expose the right kinds of information.
PrIMe 14 Introducing PrIMe: Overview Step 6. Adaptations Step 5. Knowledgeable Actors Application Structure Step 3. ActorsStep 4. Interactions Step 1. Use Cases Step 2. Information Items
PrIMe 15 Step 1: Provenance Use Cases Application Structure Step 1. Use Cases
PrIMe 16 Step 1: Provenance Use Cases We distinguish two types of provenance use case –A core provenance use case is a use case known when PrIMe is applied. –A future provenance use case is a use case that is not considered until after a process in the application is enacted but uses documentation of that process
PrIMe 17 Step 1: Provenance Use Cases Gathering Use Cases It is not always obvious to users what use cases they could expect the provenance middleware architecture to support. We provide a simple requirements elicitation process to help designers collect the core provenance use cases –Give definitions of provenance –Give examples of the general questions that can be answered using the architecture
PrIMe 18 Step 1: Provenance Use Cases Definition of Provenance The provenance of a result is the process that produced that result.
PrIMe 19 Step 1: Provenance Use Cases The OTM Application In the OTM application, use case questions relate to specific objects within the application, e.g.: –Recipients of organs –Organs –Organ Donors –Decisions
PrIMe 20 Step 1: Provenance Use Cases OTM Use Case Questions Below are questions that have been taken from the OTM application. –Retrieve data linked to all actions / events associated with a patient (recipient or donor) –What decisions were made for a particular case? –What is the medical analysis tree for a given organ? –Determine if any deviation took place from the standard workflow for a given organ
PrIMe 21 Step 1: Provenance Use Cases Eliciting Use Cases: We are looking to elicit use cases of the form: –(1) Actor A does something. –(2) Actor B does something else etc. –(3) Actor C determines the answer to a question about the provenance of data (such as a specific example of one of those above).
PrIMe 22 Step 1: Provenance Use Cases Elicitation Steps 3 Important steps: –Step (1) Describe something that already happens in the application. –Step (2) Describe a specific provenance- related use case question that cannot be answered (easily), but our functionality could help to achieve. –Step (3) Identify the relevant services required for answering the use case.
PrIMe 23 Step 1: Provenance Use Cases Example use case Donor A’s organs are screened for potential donation. What is the provenance of the donor organ’s diagnosis?
PrIMe 24 Step 1: Provenance Use Cases Form-Based Capture Donor A’s organs are screened for potential donation. What is the provenance of the donor’s organ diagnosis? Application DescriptionDonor A’s organs are screened for potential donation Use case questionWhat is the provenance of the donor’s organ diagnosis Relevant servicesUI, Donor data collector, EHCR, Testing Lab Relevant information itemsPID, Patient records, Blood analysis result, Analysis decision
PrIMe 25 Step 2: Information items Application Structure Step 1. Use Cases Step 2. Information Items
PrIMe 26 Step 2: Information Items Overview The kinds of information that will answer your use case May be one piece or many pieces of information –e.g. a given result, or a sequence of decisions For each use case, identify the information items required to satisfy the use case.
PrIMe 27 Step 2: Information Items Examples Information items may be : –Data items, i.e. the result of some calculation, decision (found in interactions or actor state). –Whole or part processes, e.g. the sequence of decisions that led to a donor’s organ being rejected for donation. –Relationships, e.g. what were the causal determinants of a given decision.
PrIMe 28 Step 2: Information Items Capture Information items are to be captured by process documentation, i.e. p-assertions –Data items: Interaction or actor state p- assertions –Processes: Interaction and relationship p- assertions
PrIMe 29 Step 3: Actors Step 3. Actors Step 1. Use Cases Step 2. Information Items Application Structure
PrIMe 30 Step 3: Actors Description An actor is an entity within the application that performs actions, e.g. Web Services, components, machines, people etc. and interacts with other actors. –One actor may be seen as being composed of other actors.
PrIMe 31 Step 3: Actors Roles in a provenance architecture Asserting Actors – assert p-assertions Recording Actors – record p-assertions Querying Actors – retrieve p-assertions Managing Actors – maintain provenance stores
PrIMe 32 Step 3: Actors Identification Heuristics Identify the components that receive information. E.g. a component/service in a workflow, a script command, the GUI/desktop application into which a user enters information. Identify the components that provide the information in each interaction. These could be, for example, a workflow engine, a script executor, a user.
PrIMe 33 Step 3: Actors OTM Example User interface Donor data collector Electronic health care records Testing laboratory Get Donor info (M1) Request p ID (M2) Return p ID (M3) Request blood test (M4) Return result (M5) Return result (M6)
PrIMe 34 Step 4: Interactions Application Structure Step 3. ActorsStep 4. Interactions Step 1. Use Cases Step 2. Information Items
PrIMe 35 Step 4: Interactions Information exchange
PrIMe 36 Step 3: Actors Information Message IDData itemReceiver ID M2 M4 M6 Q1 Pid r1 EHCR Testing Lab UI Donor data collector Sending Receiving Message IDData itemSender ID M1 M3 M5 q1 pid R1 UI EHCR Testing lab Actor : Donor data collector
PrIMe 37 Step 4: Interactions Tracking processes A common information item required for provenance use cases is the process to which documentation refers Interaction p-assertions Relationship p-assertions Tracers
PrIMe 38 Step 4: Interactions Session Tracer
PrIMe 39 Step 4: Interactions Tracer terminology A computational activity –Actors cooperating on some work Superiors –Any actor sending requests to other actors Inferiors –Any actor receiving requests from other actors Tasks –An independent computation within an actor, delimited by a request to the actor and a subsequent response from the actor
PrIMe 40 Step 4: Interactions Session Tracer Semantics Generation rule –An actor must generate a new session tracer at the start of each task and add the tracer to all requests within that task Propagation rule (to inferior) –An actor must add any session tracers received from a superior to all requests it makes to inferiors within the task started by the superior’s request Propagation rule (to superior) –An inferior must add the session tracers supplied by its superior to its response to its superior
PrIMe 41 Step 4: Interactions Other Tracers Other application specific tracers possible –e.g. In the medical domain, a tracer could be used to identify all interactions belonging to a particular case.
PrIMe 42 Step 5: Knowledgeable Actors Step 5. Knowledgeable Actors Application Structure Step 3. ActorsStep 4. Interactions Step 1. Use Cases Step 2. Information Items
PrIMe 43 Step 5: Knowledgeable Actors Knowledgeable actors have access to Information items Sometimes, for a given information item, a knowledgeable actor cannot be found Further decomposition might be necessary or, New actors may need to be introduced (Step 6)
PrIMe 44 Step 5: Knowledgeable Actors Who knows what? Hospital EHCRS Testing lab
PrIMe 45 Step 5: Knowledgeable Actors OTM Example User interface Donor data collector Electronic health care records Testing laboratory Hospital
PrIMe 48 Step 6: Adaptations Step 6. Adaptations Step 5. Knowledgeable Actors Application Structure Step 3. ActorsStep 4. Interactions Step 1. Use Cases Step 2. Information Items
PrIMe 49 Step 6: Adaptations Modifying actors A non-knowledgeable actor may be modified so that it gains access to information items not currently available to itself or other actors in the system.
PrIMe 50 Step 6: Adaptations Actor Introduction A new actor can be introduced to the application to help in the answering of use cases
PrIMe 51 Step 6: Adaptations Interaction Extension An interaction in the application can be extended to exchange more information between a knowledgeable actor and a non- knowledgeable actor, making the latter knowledgeable. Actor Before Actor After a a,b
PrIMe 52 Step 6: Adaptations Interaction Introduction A new interaction between actors can be introduced into the application in which a knowledgeable actor sends the information item to another actor, which then becomes knowledgeable. Actor Before Actor After b
PrIMe 53 Step 6: Adaptations Provenance Functionality The provenance wrapper exposes an actor’s input and output data, relationships and aspects of the actor’s state.
PrIMe 54 Step 6: Adaptations The Client Side Library A collection of functions –To allow provenance-aware applications to communicate with provenance store services –An implementation of the actor side library should contain at least one of the query library, the record library and the management library –Helps application developers follow architecture rules
PrIMe 55 Step 6: Adaptations CSL Layered Approach Client Side Library Applications Provenance Store Server Application API Utilities Server API
PrIMe 56 Recording Provenance
PrIMe 57 Evaluation Protein compressibility experiment 10% overhead for asynchronous recording
PrIMe 58 Summarising PrIMe Step 1: Identify the provenance use cases Step 2: Identify relevant information items –Step 3: Identify actors –Step 4: Identify interactions –Step 5: Identify knowledgeable actors Step 6: Make necessary adaptations Granularity
PrIMe 59 Conclusions PrIMe provides a clear and easy guide to make applications provenance-aware Crucial in the adoption of the Provenance Middleware Architecture
PrIMe 60 Questions? Steve Munroe email@example.com