Presentation is loading. Please wait.

Presentation is loading. Please wait.

7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Provenance Framework in Kepler Ilkay AltintasNorbert Podhorszki Contributors: S. Bowers,

Similar presentations


Presentation on theme: "7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Provenance Framework in Kepler Ilkay AltintasNorbert Podhorszki Contributors: S. Bowers,"— Presentation transcript:

1 7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Provenance Framework in Kepler Ilkay AltintasNorbert Podhorszki Contributors: S. Bowers, B. Ludäscher, T. McPhillips (UC Davis) O. Barney (U Utah), E. Jäger-Frank (SDSC)

2 Provenance 2Ptolemy Miniconference, February 13, 2007 Outline Provenance? What is it? Framework in Kepler to record provenance data RWS: A provenance model suitable for Kepler's different computational models. Possible Applications of Provenance

3 Provenance 3Ptolemy Miniconference, February 13, 2007 What to track and why Do we need some tracking of what is happening? Recreate results and rebuild workflows using the evolution information (see repeatable experiments) Associate the workflow with the results it produced Create links between generated data in different runs, and compare different runs Recover from a system failure Checkpoint a workflow Debug and explain results (via lineage tracing, …) Smart Reruns Avoid re-generating the same data all the time

4 Provenance 4Ptolemy Miniconference, February 13, 2007 Model of Provenance Core feature capture the processing history (trace) leading to a data product Model of Computation (MoC) Well-defined in terms of input/output relations and the (partial) order of actions MPI  OMoC ( Program, Input )  Output DAG, SDF, DDF, PN, etc Different ways of specification see Ptolemy-related papers, Kahn-McQueen paper, etc. give abstract/high-level pseudo code Practically it is defined through the implementation of the execution system (including the scheduling). In Kepler/Ptolemy it is the Director. There are legal (possible) runs under a given MoC

5 Provenance 5Ptolemy Miniconference, February 13, 2007 Model of Provenance Model of Provenance (MoP) The starting point is a MoC and its particular implementation Observables e.g. a single fired(x, A, y) or reads, writes and actions separately Trace: recorded assertions (about observable events) during a legal run MoP is a MoC, except the “legal run” replaced with “legal trace” There is a default MoP for a MoC: the total trace of each observable events Turing machine: moves of the head, data read and written A MoP may add another information or omit some (“T=R-I+M”) Trace = Run – Ignored things + Modelled additional things M: Add real timestamps of actions, execution host information I: Omit the input for each action if this can be inferred unambiguously later (DAG) Depends on the application of the trace T

6 Provenance 6Ptolemy Miniconference, February 13, 2007 MoP Examples DAG workflow Record: Output data generated by the actions Inference: Execution of actions and inputs to them can be inferred from the DAG itself Smart-rerun Record: Output of an action and the parameters for that action should be recorded Inference: If an action’s parameter is not changed and actions on which this action depends (inferred from the workflow graph) are also unchanged, the action’s output will be the same in a future run. Kitchen definition A MoP is “good” if it can handle the intended questions & use cases.

7 Provenance 7Ptolemy Miniconference, February 13, 2007 Kepler: Streaming actors Stateful actors An output depends on all inputs in the past. e.g. AddSubstract Stateless actors An output depends only on inputs read in the current firing. E.g. Expression, RecordAssembler Non-conformist actors Filter, Running average, Daily average (some of the past inputs) How do you determine correctly which inputs a given output depends on? MoP Examples A

8 Provenance 8Ptolemy Miniconference, February 13, 2007 Kepler: Data dependent routing (branches and loops) The firing history of the actors cannot be inferred from the static workflow graph Something should be recorded (e.g. firings) MoP Examples

9 Provenance 9Ptolemy Miniconference, February 13, 2007 RWS A Model of Provenance for Kepler Directors

10 Provenance 10Ptolemy Miniconference, February 13, 2007 what about actor state? what about “real” dependencies? State-reset event s defines when actor “cuts off” dependencies a semantic notion, known to the actor [developer] (or part of a higher-order scheme) r, r … r, w, w, … w, s!, r, … r, w,... w, … reference: IPAW’06, Bowers et al RWS: Read − Write − State-reset s! A r … rw … w PS ??? r, r … r, w, w, … w, r, … r, w, … w … time firing

11 Provenance 11Ptolemy Miniconference, February 13, 2007 RWS trace of some actors Stateless actor(r + w + s)* : r … r w… w s r … r w… w s … Stateful actor(r + w + )* Simple filter actor (conditional depends only on current token) (r w ? s)* : either it emits a token or not Daily average of hourly measurement((r w) 24 s)* Generally: RWS firing is defined in terms of r and w events r + w + defines one RWS firing (most Kepler actors behave similarly) More general: definition of the RWS firing round (r + w + )* s : dependencies among several firings …

12 Provenance 12Ptolemy Miniconference, February 13, 2007 Kepler Provenance Framework

13 Provenance 13Ptolemy Miniconference, February 13, 2007 Provenance Framework in Kepler Modeled as a separate concern in the system Optional drag and drop feature Listen to execution and save information (customizable): Context: who, what, where, and when that is associated with the run Input data and its associated metadata Workflow outputs and intermediate data products Workflow definition (entities, parameters, connections): a specification of what exists in the workflow and can have a context of its own Information about the workflow evolution -- workflow trail

14 Provenance 14Ptolemy Miniconference, February 13, 2007 Kepler System Architecture Authentication GUI Vergil SMS Kepler Core Extensions Ptolemy …Kepler GUI Extensions… Actor&Data SEARCH Type System Ext Provenance Recorder Kepler Object Manager Documentation Smart Re-run / Failure Recovery IPAW’06-Altintas et al.

15 Provenance 15Ptolemy Miniconference, February 13, 2007 Kepler Provenance Recorder (IPAW’06, Altintas et al) Parametric and customizable –Different report formats –Variable levels of verbosity all, some, medium, on error –Multiple cache destinations Saves information on –User name, Date, Run, etc…

16 Provenance 16Ptolemy Miniconference, February 13, 2007 Implementation details The Provenance Recorder Extends the Ptolemy AbstractSettableAttribute Listens to the Director for Changes in the workflow graph Initialization, workflow execution and stop Actor firing Listens to all IOPorts for Token emissions on output ports to record output data That is, we could say it is a Ptolemy Provenance Framework

17 Provenance 17Ptolemy Miniconference, February 13, 2007 Implementation details Builds an internal representation of the workflow graph Ptolemy’s DirectedGraph Nodes: IOPorts, Edges: port connections Used for Recording workflow structure (dependencies among ports) Subscribing at all ports (listening for input/output)

18 Provenance 18Ptolemy Miniconference, February 13, 2007 Application: smart-rerun

19 Provenance 19Ptolemy Miniconference, February 13, 2007 Implementation of RWS in Kepler Data model i.e. observables in all MoC implementations in Kepler Port-actor relationship portTable(Port, Actor, type) type is a for atomic and c for composite actors (transparent) Token-object relationship tokenTable(Token, Object) Object-value relationship objectTable(Object, Value, Type) type is currently not recorded RWS trace traceTable(Port, Event, Token, FiringCounter) event: r as read, w as write or s as state-reset

20 Provenance 20Ptolemy Miniconference, February 13, 2007 Extending the framework 1. Initialization ( initialize() ) Framework traverses the workflow graph (ports and connections) RWS: generate specific data structures (port, actor and connection details) 2. Just before start ( validate() ) Framework subscribes for event listeners RWS: subscribe additional listener TokenGetEvent

21 Provenance 21Ptolemy Miniconference, February 13, 2007 Extending the framework 3. When workflow is modified ( changeExecuted() ) Framework traverses the workflow graph (ports and connections) RWS: re-generate data structures 4. During execution when an event occurs TokenSendEvent() and TokenGetEvent() listeners are extended to generate RWS trace events

22 Provenance 22Ptolemy Miniconference, February 13, 2007 Possible applications of Provenance Smart-rerun Monitoring/debugging of a workflow see LiDAR poster today by Efrat Jäger-Frank Answering processing history, data related question Participated at the First Provenance Challenge with Kepler-RWS http://twiki.ipaw.info/bin/view/Challenge/RWS Reporting/documentation of workflows and data products Generate my publication

23 Provenance 23Ptolemy Miniconference, February 13, 2007 Acknowledgement RWS model Shawn Bowers and Timothy McPhillips, UC Davis Formalization of the MoPs Bertram Ludäscher, UC Davis Kepler Provenance Framework implementation Oscar Barney, Univ. of Utah, Salt Lake City Efrat Jäger-Frank, SDSC, San Diego

24 Provenance 24Ptolemy Miniconference, February 13, 2007 References RWS model S.Bowers, T.McPhillips, B.Ludäscher, S.Cohen and S.B.Davidson A Model for User-Oriented Data Provenance in Pipelined Scientific Workflows Intl. Provenance and Annotation Workshop (IPAW), Chicago, 2006 B.Ludäscher, N.Podhorszki, I.Altintas, S.Bowers, T.McPhillips From Computation Models to Models of Provenance and the RWS Model to appear in 2007 in Journal of Concurrency and Computation: Practice and Experience Provenance framework I.Altintas, O.Barney, E.Jäger-Frank Provenance Collection Support in the Kepler Scientific Workflow System Intl. Provenance and Annotation Workshop (IPAW), Chicago, 2006


Download ppt "7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Provenance Framework in Kepler Ilkay AltintasNorbert Podhorszki Contributors: S. Bowers,"

Similar presentations


Ads by Google