Presentation on theme: "Provenance Challenge, Sept. 20061 Modeling Provenance through User views Sarah Cohen-Boulakia Shirley Cohen Susan Davidson Thunyarat (Bam) Amornpetchkul."— Presentation transcript:
Provenance Challenge, Sept. 20061 Modeling Provenance through User views Sarah Cohen-Boulakia Shirley Cohen Susan Davidson Thunyarat (Bam) Amornpetchkul Olivier Biton Database group, University of Pennsylvania
Provenance Challenge, Sept. 20062 Our approach provenance Model of provenance Based on study of user requirements (CIPRES) Based on careful studies of workflow systems (Kepler, MyGrid, Chimera) minimal information to reason about provenance No workflow system is proposed User views Capability of workflow systems to group steps (forming boxes) and to zoom into boxes granularity Multi-granularity levels of provenance Implemented Implemented in Oracle 10g and Java Relationaltransitive closure Relational framework augmented with transitive closure objectuser interface Java/Spring/JDBC: object layer and user interface
Provenance Challenge, Sept. 20063 Workflow Representation Terminology Step-classes Step-classes (static) steps An execution of a workflow generates a partial order of steps (dynamic) Instances of step classes input output Each step has input and output data 8.reslice: step reslice: step-class input data output data
Provenance Challenge, Sept. 20064 Provenance Trace Base tables DataDataAttributes Data(dataid, name, type), DataAttributes(dataid, attribute, value) Data(1, Anatomy Image1, Anatomy Image) DataAttributes(1, center, UChicago) Center=UChicago InstanceOfStepParams InstanceOf(Step,Step-Class,ts), StepParams(step, attribute, value), StageInstance(step, stage) InputOutput Input(stepId,dataId,ts) / Output(stepId,dataId,ts) stepId takes as input /produces dataId at time ts Views Process Process(stepId, stepClass, input, output, time) …
Provenance Challenge, Sept. 20065 Provenance Queries Q1: Find the process that led to Atlas X Graphic / everything that caused Atlas X Graphic to be as it is Implements transitive closure. Necessary to return all the data used to (recursively) compute Atlas X Graphic. SELECT DISTINCT step, step-class, input, output FROM Process START WITH output = ( SELECT ID FROM DataID WHERE name = 'Atlas X Graphic' ) CONNECT BY PRIOR CONNECT BY PRIOR input = output ORDER BY step;
Provenance Challenge, Sept. 20066 Provenance Queries (Cont.) queries answered All the queries can be answered by our system Code available on TWiki SQL Using SQL Connect by operators Joins with several tables (e.g. Parameters, DataAttribute) Minus and Union operators generalization of Q7 The generalization of Q7 (difference between workflows) is currently not answerable
Provenance Challenge, Sept. 20067 Workflow Variant: User Views Box1 Box2 UBio UBlackBox UAdmin UAdmin can see everything What What are User views? detail Level of detail the user wishes to track Permissions Permissions given to the user Ability Ability of the user to see / know the sub-steps (distributed computation) Why Why use User Views? Throw away Throw away unimportant intermediate results Better understanding Better understanding of the workflow Reduce Reduce the amount of work to be redone
Provenance Challenge, Sept. 20068 Querying within User Views Need information from Workflow: Step-class containment and user views Cinput(sid,idid,tsi), Coutput(sid,idid,tso) View UProcess(usr, step, step-class, input, output) Query: What are all the data items used to produce“Resliced Image1”? SELECT * FROM uProcess upc WHERE usr = :userName START WITH outputName = 'Resliced Image1' CONNECT BY PRIOR upc.output = upc.input; UAdmin UAdmin: Anatomy Header 1, Anatomy Image1, Reference Image, Reference Header, Wrap param1 UBio UBio: Anatomy Header 1, Anatomy Image1, Reference Image, Reference Header UBlackBox UBlackBox: empty answer!
Provenance Challenge, Sept. 20069 Conclusion, Perspectives queries Able to answer the queries, including Data Data and Step provenance Immediate and Deep Immediate and Deep (recursive) provenance user views Variation of the workflow and queries considering user views granularity Multi-granularity levels of provenance Only visible and necessary data are kept Open questions stage What is the meaning of “stage” in a workflow (with respect to user views)? difference What are we expecting as an answer to the difference between two workflows (cf. query 7)? biologically significant Are all the procedures of the workflow “biologically significant” (cf. user views)?
Provenance Challenge, Sept. 200610 Acknowledgements Kepler Group Shawn Bowers Bertram Ludascher Timothy McPhillips Biologists from the CIPRES project Members from the Database group, University of Pennsylvania This work is supported by NSF grants 0513778, 0415810, and 0612177
Provenance Challenge, Sept. 200611 User interface