Presentation on theme: "A Proof of Concept: Provenance in a Service Oriented Architecture Liming Chen, Victor Tan, Fenglian Xu, Alexis Biller, Paul Groth, Simon Miles, John Ibbotson,"— Presentation transcript:
A Proof of Concept: Provenance in a Service Oriented Architecture Liming Chen, Victor Tan, Fenglian Xu, Alexis Biller, Paul Groth, Simon Miles, John Ibbotson, Michael Luck and Luc Moreau
Purpose Asking questions about the provenance of something, i.e. the process by which it came to be as it is, is essential in many domains We are working with bioinformaticians, medics, aerospace engineers, physicists and have found a wide range of questions they wish to ask A simple example application can: –Clarify the requirements on software to aid answering those questions –Be used to explain the issues involved to non-domain experts –Be extended in controlled ways to explore issues that arise in real applications
EU Provenance and PASOA Recent work of the EU Provenance project: –Developed a logical architecture for software to aid answering provenance-related questions, along with other research on security, scalability and user tool support. –Now being applied to two project applications: organ transport management (UPC, Spain) and aerospace engineering (DLR, Germany) –The logical architecture document should be released next week: keep an eye on www.gridprovenance.org Recent work of the PASOA project: –Has focused on e-Science applications and has gathered requirements, developed protocols and software –EU Provenance used PASOA software for the work described in this talk –PASOA will be discussed in the following two presentations
Outline The example application Asking provenance-related questions The example as a service-oriented process Recording documentation of a process What does the example show us? What are the limits of the example? Conclusions
Baking a Victoria Sponge INGREDIENTS –110g (4oz) Butter 110g (4oz) Caster Sugar 110g (4oz) Self-raising Flour 2 Eggs Vanilla Essence or 1 tsp Grated Lemon Rind RECIPE –Preheat oven to 190°C: 375°F: Gas 5. Whisk together the butter and sugar until light and creamy. Add the beaten eggs gradually with a little of the flour. Fold in the remaining sieved flour and add the flavouring. Divide equally between two 15cm (6 inch) sandwich tins. Bake for 20 - 25 minutes. Turn out on to a wire rack to cool. This is not so a contrived an example! www.thefoody.com
get mixture 1 20g sugar and 20g butter whisk them together
2 eggs beat the eggs for 2 minutes mix the beaten eggs with mixture 1 obtain mixture 2
100g flour together with mixture 2 fold to mixture 3
put mixture 3 into oven set baking time to 30min set baking temperature to 180˚C obtain a cake
After Baking Some questions can be asked after baking a cake Answers to the questions can be found if we record details of the baking process during its execution Details of the baking process is what we call the provenance of a cake
What went wrong? Questions Did we follow the recipe accurately? –Did we use the correct ingredients at the right time? –Did we provide the correct quantities? Correct units? –Did we perform actions for the right duration? We need to keep a record of all actions performed with all their parameters (such as the number of eggs used) Organ transplant example: Did the medics follow the correct procedure? Bioinformatics example: Did I analyse a amino acid sequence using tools that actually only apply to nucleotide sequences?
What went wrong? Questions Other factors can affect the baking process: –Amount of flour required varies with altitude –Oven is broken and baked at a different temperature We need to know the internal state of the different entities participating in the baking process (such as actual oven temperature or oven altitude) Organ transplant example: By what criteria did a team decide to accept or reject an organ? Bioinformatics example: What script was used by the services to perform each stage of the experiment?
Process Analysis Questions Did we use the same amount of ingredients for baking cake 1 and cake 2? or in the same proportion? What was the longest step in the execution of a recipe? Why did not we finish the process? Where did we stop? The process that led to a given cake should be delimited and analysable Organ transplant example: Which patients death led to the organ now being transplanted? Bioinformatics example: What samples led to the final analysis result?
What Did Parties Do? Questions Did the baker follow the users instructions (regardless of any claim from the baker)? Did each step of the baking process follow the users instructions? Did they receive the correct instructions? –Did they follow the received instructions? All entities should document their view of a process because it may vary Organ transplant example: Were there differing opinions on the suitability of an organ for transplant? Bioinformatics example: I claim I used a database in my experiments whose license allows me to patent my results: does the database owner confirm this?
Implementation We implemented the application as a set of Web Services, and then implemented clients that answered the provenance- related questions by querying the provenance store This involved mapping the scenario onto a service-oriented architecture
Mixture 1 Cake Mixture 1 + Eggs + Beating Time Mixture 2 Flour + Mixture 2 Mixture 3 Mixture 3 + Temperature + Baking Time Sugar + Flour + Beating Time + Temperature Cake Service-Oriented Process Butter + Sugar Whisk Beat & Mix Fold Oven Bake BakerUser
Recording Whisk Beat & Mix Fold Oven Bake BakerUser Provenance Store After baking, the provenance store contains a trace of the different activities that were involved in the production of a cake. The provenance of a cake is the documentation of the process that led to that cake WhiskReturn (Mixture 1) OvenBakeReturn (Cake) Beat&Mix (Mixture 1, Eggs, Beating Time) Beat&MixReturn (Mixture 2) Fold (Flour, Mixture 2) FoldReturn (Mixture 3) OvenBake (Mixture 3, Temperature, Baking Time) Baker (Sugar, Flour, Beating Time, Temperature BakerReturn (Cake) Whisk (Butter, Sugar)
Process Documentation and Provenance We distinguish –process documentation (the documentation recorded into a provenance store about a process) –provenance (the information retrieved from a provenance store about a process) This is because we have found there to be different requirements on each Process documentationProvenance Processing
Process documentation Should allow questions about the provenance of entities to be answered Should follow a consistent, application-independent structure so that independent parties can record documentation that is easily combined –e.g. oven may be owned by someone other than the user, but their documentation is combined to answer whether the requested temperature was used Should state exactly what those recording it know to have happened, not confuse it with what they guessed or inferred had happened –e.g. baker states that it put the cake in the oven, not that the cake was successfully baked, because the oven may have been broken
Provenance Should give the client asking for the provenance of something control over the scope of the answer –e.g. whether the process that produced the flour is included in the provenance of the cake Should be/provide the information relevant to answering a clients/users questions (not swamp them with detail) –e.g. report how much flour used rather than giving XML structure sent between application components May (in order to achieve the above) include inferred information –e.g. infer from baker putting mixture in oven and getting cake out that the cake was successfully baked from the mixture
Provenance architectures Should allow different parties to record independent documentation if they want to –e.g. user and baker can record independently, allowing discrepancies to be noticed Should have no dependence on any one workflow engine/language, and no requirement for (explicit) workflows to be used at all –e.g. our example application was written in Java, and baking in reality follows a plan in someones head Should have independence from any one product of a process: should not be necessary to store process documentation with any one result of a process –e.g. the provenance of the cake, the provenance of the ingredients and the provenance of the intermediate mixtures overlap, so cannot claim it belongs to any
Limitations and Strengths The current example has limitations: –Physical world treated as if it mapped directly to the electronic world: how does a baker record documentation in a provenance store Web Service? through a GUI? what if the GUI goes wrong or they use the GUI wrongly, do we still have sound process documentation? –None of the objects in the process have constituent parts that we may want to independently find the provenance of –Assumes a single provenance store that every service happily submits documentation to …but the strength of the example is that it can be simply extended to remove these limitations
Conclusions The simple example allows us to determine the requirements on software to record process documentation and make it available to users We have used it as a testbed, extending it to explore other aspects of provenance (along with other applications) It is rich enough to continue extending to mirror, in a controlled way, issues discovered in the future
EU Provenance Partners IBM United Kingdom Limited University of Southampton University of Wales, Cardiff Deutsches Zentrum fur Luft- und Raumfahrt s.V Universitat Politecnica de Catalunya Magyar Tudomanyos Akademia Szamitastechnikai es Automatizalasi Kutato Intezet