7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Provenance Framework in Kepler Ilkay AltintasNorbert Podhorszki Contributors: S. Bowers,

Slides:



Advertisements
Similar presentations
Provenance-Aware Storage Systems Margo Seltzer April 29, 2005.
Advertisements

IPAW'08 – Salt Lake City, Utah, June 2008 Data lineage model for Taverna workflows with lightweight annotation requirements Paolo Missier, Khalid Belhajjame,
Querying Workflow Provenance Susan B. Davidson University of Pennsylvania Joint work with Zhuowei Bao, Xiaocheng Huang and Tova Milo.
Provenance GGF18 Kepler/COW+RWS, Kepler/COW+RWS, Bowers, McPhiilips et al. Provenance Management in a COllection-oriented Scientific Workflow.
UCSD SAN DIEGO SUPERCOMPUTER CENTER Ilkay Altintas Scientific Workflow Automation Technologies Provenance Collection Support in the Kepler Scientific Workflow.
Jianwu Wang, Daniel Crawl, Ilkay Altintas San Diego Supercomputer Center, University of California, San Diego 9500 Gilman Drive, MC 0505 La Jolla, CA ,
WS-VLAM Introduction presentation ws-VLAM workflow Composer System and Network Engineering group Institute of informatics University of Amsterdam.
Programming Types of Testing.
Experiences in Integration of the 'R' System into Kepler Dan Higgins – National Center for Ecological Analysis and Synthesis (NCEAS), UC Santa Barbara.
Workflow Exchange and Archival: The KSW File and the Kepler Object Manager Shawn Bowers (For Chad Berkley & Matt Jones) University of California, Davis.
6th Biennial Ptolemy Miniconference Berkeley, CA May 12, 2005 Distributed Computing in Kepler Ilkay Altintas Lead, Scientific Workflow Automation Technologies.
Ngu, Texas StatePtolemy Miniconference, February 13, 2007 Flexible Scientific Workflows Using Dynamic Embedding Anne H.H. Ngu, Nicholas Haasch Terence.
February 12, 2009 Center for Hybrid and Embedded Software Systems Encapsulated Model Transformation Rule A transformation.
On Developing Data Grid Workflows using Storage Resource Broker (SRB) and Kepler Tim H. Wong - UC Davis Efrat Frank - SDSC Bertram Ludäscher - UC Davis.
February 12, 2009 Center for Hybrid and Embedded Software Systems Model Transformation Using ERG Controller Thomas H. Feng.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 8: February 11, 2009 Dataflow.
5 th Biennial Ptolemy Miniconference Berkeley, CA, May 9, 2003 MESCAL Application Modeling and Mapping: Warpath Andrew Mihal and the MESCAL team UC Berkeley.
Models of Computation as Program Transformations Chris Chang
7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Scheduling Data-Intensive Workflows Tim H. Wong, Daniel Zinn, Bertram Ludäscher (UC.
Composing Models of Computation in Kepler/Ptolemy II Summary. A model of computation (MoC) is a formal abstraction of execution in a computer. There is.
Department of Computer Science 1 CSS 496 Business Process Re-engineering for BS(CS)
Process Modeling SYSTEMS ANALYSIS AND DESIGN, 6 TH EDITION DENNIS, WIXOM, AND ROTH © 2015 JOHN WILEY & SONS. ALL RIGHTS RESERVED. 1 Roberta M. Roth.
A Semantic Workflow Mechanism to Realise Experimental Goals and Constraints Edoardo Pignotti, Peter Edwards, Alun Preece, Nick Gotts and Gary Polhill School.
January, 23, 2006 Ilkay Altintas
Searching Provenance Shankar Pasupathy, Network Appliance PASS Workshop, Harvard October 2005.
Composing Models of Computation in Kepler/Ptolemy II
NOVA: CONTINUOUS PIG/HADOOP WORKFLOWS. storage & processing scalable file system e.g. HDFS distributed sorting & hashing e.g. Map-Reduce dataflow programming.
University of California, Davis Daniel Zinn 1 University of California, Davis Daniel Zinn 1 Parallel Virtual Machines in Kepler Daniel Zinn Xuan Li Bertram.
Workflow Project Luciano Piccoli Illinois Institute of Technology.
Dart: A Meta-Level Object-Oriented Framework for Task-Specific Behavior Modeling by Domain Experts R. Razavi et al..OOPSLA Workshop DSML‘ Dart:
Usage of `provenance’: A Tower of Babel Luc Moreau.
Architecture Tutorial 1 Overview of Today’s Talks Provenance Data Structures Recording and Querying Provenance –Break (30 minutes) Distribution and Scalability.
Kepler/pPOD: Scientific Workflow and Provenance Support for Assembling the Tree of Life UC DAVIS Department of Computer Science The Kepler/pPOD Team Shawn.
Semantic Mediation in SEEK/Kepler: Exploiting Semantic Annotation for Discovery, Analysis, and Integration of Scientific Data and Workflows Bertram Ludäscher.
Towards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering Nithya N. Vijayakumar, Beth Plale DDE Lab, Indiana University {nvijayak,
Accelerating Scientific Exploration Using Workflow Automation Systems Terence Critchlow (LLNL) Ilkay Altintas (SDSC) Scott Klasky(ORNL) Mladen Vouk (NCSU)
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
Oracle Data Integrator Procedures, Advanced Workflows.
1 Ilkay ALTINTAS - July 24th, 2007 Ilkay ALTINTAS Director, Scientific Workflow Automation Technologies Laboratory San Diego Supercomputer Center, UCSD.
Data Production and Provenance Being an Exploration of the Mathematical Plumbing Required to Track Provenance in the Sense of Production History, as well.
Provenance Challenge Simon Miles, Mike Wilde, Ian Foster and Luc Moreau.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Provenance Challenge gLite Job Provenance.
Paolo Missier (1), Bertram Luda ̈ scher (2), Shawn Bowers (3), Saumen Dey (2), Anandarup Sarkar (3), Biva Shrestha (4), Ilkay Altintas (5), Manish Kumar.
Kepler+PF+RWS, Kepler+PF+RWS, Podhorszki, Altintas et al. Provenance GGF18 RWS Provenance Experiments in Kepler (Kepler + PR + RWS) Norbert.
Streamflow - Programming Model for Data Streaming in Scientific Workflows Chathura Herath.
ICCS WSES BOF Discussion. Possible Topics Scientific workflows and Grid infrastructure Utilization of computing resources in scientific workflows; Virtual.
Kepler includes contributors from GEON, SEEK, SDM Center and Ptolemy II, supported by NSF ITRs (SEEK), EAR (GEON), DOE DE-FC02-01ER25486.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
University of California, Davis Daniel Zinn 1 University of California, Davis Daniel Zinn 1 Daniel Zinn Bertram Ludäscher University of California at Davis.
ESIP Semantic Web Products and Services ‘triples’ “tutorial” aka sausage making ESIP SW Cluster, Jan ed.
Recording the Context of Action for Process Documentation Ian Wootten Cardiff University, UK
Testing OO software. State Based Testing State machine: implementation-independent specification (model) of the dynamic behaviour of the system State:
Toward interactive visualization in a distributed workflow Steven G. Parker Oscar Barney Ayla Khan Thiago Ize Steven G. Parker Oscar Barney Ayla Khan Thiago.
Recording Actor Provenance in Scientific Workflows Ian Wootten, Shrija Rajbhandari, Omer Rana Cardiff University, UK.
Capturing Requirements. Questions to Ask about Requirements 1)Are the requirements correct? 2)Consistent? 3)Unambiguous? 4)Complete? 5)Feasible? 6)Relevant?
Satisfying Requirements BPF for DRA shall address: –DAQ Environment (Eclipse RCP): Gumtree ISEE workbench integration; –Design Composing and Configurability,
SDM Center Experience with Fusion Workflows Norbert Podhorszki, Bertram Ludäscher Department of Computer Science University of California, Davis UC DAVIS.
Ocean Observatories Initiative OOI Cyberinfrastructure Life Cycle Objectives Review January 8-9, 2013 Scientific Workflows for OOI Ilkay Altintas Charles.
THE EYESWEB PLATFORM - GDE The EyesWeb XMI multimodal platform GDE 5 March 2015.
ACCESSING DATA IN THE NIS USING THE KEPLER WORKFLOW SYSTEM Corinna Gries.
Kepler BEAM Workshop Samantha Romanello LTER Network Office.
Aleksandra Pawlik University of Manchester. Something that can be put into a workflow Well described - what the component does Behaves “well” - conforms.
Scientific workflow in Kepler – hands on tutorial
The 2007 Winter Conference on Business Intelligence
Event Relation Graphs and Extensions in Ptolemy II
An Introduction to Software Architecture
Model Transformation with the Ptera Controller
A Semantic Type System and Propagation
Background: Currently CCP4i puts each structure determination into a separate project directory, and automatically keeps a “Project History Database” recording.
Presentation transcript:

7th Biennial Ptolemy Miniconference Berkeley, CA February 13, 2007 Provenance Framework in Kepler Ilkay AltintasNorbert Podhorszki Contributors: S. Bowers, B. Ludäscher, T. McPhillips (UC Davis) O. Barney (U Utah), E. Jäger-Frank (SDSC)

Provenance 2Ptolemy Miniconference, February 13, 2007 Outline Provenance? What is it? Framework in Kepler to record provenance data RWS: A provenance model suitable for Kepler's different computational models. Possible Applications of Provenance

Provenance 3Ptolemy Miniconference, February 13, 2007 What to track and why Do we need some tracking of what is happening? Recreate results and rebuild workflows using the evolution information (see repeatable experiments) Associate the workflow with the results it produced Create links between generated data in different runs, and compare different runs Recover from a system failure Checkpoint a workflow Debug and explain results (via lineage tracing, …) Smart Reruns Avoid re-generating the same data all the time

Provenance 4Ptolemy Miniconference, February 13, 2007 Model of Provenance Core feature capture the processing history (trace) leading to a data product Model of Computation (MoC) Well-defined in terms of input/output relations and the (partial) order of actions MPI  OMoC ( Program, Input )  Output DAG, SDF, DDF, PN, etc Different ways of specification see Ptolemy-related papers, Kahn-McQueen paper, etc. give abstract/high-level pseudo code Practically it is defined through the implementation of the execution system (including the scheduling). In Kepler/Ptolemy it is the Director. There are legal (possible) runs under a given MoC

Provenance 5Ptolemy Miniconference, February 13, 2007 Model of Provenance Model of Provenance (MoP) The starting point is a MoC and its particular implementation Observables e.g. a single fired(x, A, y) or reads, writes and actions separately Trace: recorded assertions (about observable events) during a legal run MoP is a MoC, except the “legal run” replaced with “legal trace” There is a default MoP for a MoC: the total trace of each observable events Turing machine: moves of the head, data read and written A MoP may add another information or omit some (“T=R-I+M”) Trace = Run – Ignored things + Modelled additional things M: Add real timestamps of actions, execution host information I: Omit the input for each action if this can be inferred unambiguously later (DAG) Depends on the application of the trace T

Provenance 6Ptolemy Miniconference, February 13, 2007 MoP Examples DAG workflow Record: Output data generated by the actions Inference: Execution of actions and inputs to them can be inferred from the DAG itself Smart-rerun Record: Output of an action and the parameters for that action should be recorded Inference: If an action’s parameter is not changed and actions on which this action depends (inferred from the workflow graph) are also unchanged, the action’s output will be the same in a future run. Kitchen definition A MoP is “good” if it can handle the intended questions & use cases.

Provenance 7Ptolemy Miniconference, February 13, 2007 Kepler: Streaming actors Stateful actors An output depends on all inputs in the past. e.g. AddSubstract Stateless actors An output depends only on inputs read in the current firing. E.g. Expression, RecordAssembler Non-conformist actors Filter, Running average, Daily average (some of the past inputs) How do you determine correctly which inputs a given output depends on? MoP Examples A

Provenance 8Ptolemy Miniconference, February 13, 2007 Kepler: Data dependent routing (branches and loops) The firing history of the actors cannot be inferred from the static workflow graph Something should be recorded (e.g. firings) MoP Examples

Provenance 9Ptolemy Miniconference, February 13, 2007 RWS A Model of Provenance for Kepler Directors

Provenance 10Ptolemy Miniconference, February 13, 2007 what about actor state? what about “real” dependencies? State-reset event s defines when actor “cuts off” dependencies a semantic notion, known to the actor [developer] (or part of a higher-order scheme) r, r … r, w, w, … w, s!, r, … r, w,... w, … reference: IPAW’06, Bowers et al RWS: Read − Write − State-reset s! A r … rw … w PS ??? r, r … r, w, w, … w, r, … r, w, … w … time firing

Provenance 11Ptolemy Miniconference, February 13, 2007 RWS trace of some actors Stateless actor(r + w + s)* : r … r w… w s r … r w… w s … Stateful actor(r + w + )* Simple filter actor (conditional depends only on current token) (r w ? s)* : either it emits a token or not Daily average of hourly measurement((r w) 24 s)* Generally: RWS firing is defined in terms of r and w events r + w + defines one RWS firing (most Kepler actors behave similarly) More general: definition of the RWS firing round (r + w + )* s : dependencies among several firings …

Provenance 12Ptolemy Miniconference, February 13, 2007 Kepler Provenance Framework

Provenance 13Ptolemy Miniconference, February 13, 2007 Provenance Framework in Kepler Modeled as a separate concern in the system Optional drag and drop feature Listen to execution and save information (customizable): Context: who, what, where, and when that is associated with the run Input data and its associated metadata Workflow outputs and intermediate data products Workflow definition (entities, parameters, connections): a specification of what exists in the workflow and can have a context of its own Information about the workflow evolution -- workflow trail

Provenance 14Ptolemy Miniconference, February 13, 2007 Kepler System Architecture Authentication GUI Vergil SMS Kepler Core Extensions Ptolemy …Kepler GUI Extensions… Actor&Data SEARCH Type System Ext Provenance Recorder Kepler Object Manager Documentation Smart Re-run / Failure Recovery IPAW’06-Altintas et al.

Provenance 15Ptolemy Miniconference, February 13, 2007 Kepler Provenance Recorder (IPAW’06, Altintas et al) Parametric and customizable –Different report formats –Variable levels of verbosity all, some, medium, on error –Multiple cache destinations Saves information on –User name, Date, Run, etc…

Provenance 16Ptolemy Miniconference, February 13, 2007 Implementation details The Provenance Recorder Extends the Ptolemy AbstractSettableAttribute Listens to the Director for Changes in the workflow graph Initialization, workflow execution and stop Actor firing Listens to all IOPorts for Token emissions on output ports to record output data That is, we could say it is a Ptolemy Provenance Framework

Provenance 17Ptolemy Miniconference, February 13, 2007 Implementation details Builds an internal representation of the workflow graph Ptolemy’s DirectedGraph Nodes: IOPorts, Edges: port connections Used for Recording workflow structure (dependencies among ports) Subscribing at all ports (listening for input/output)

Provenance 18Ptolemy Miniconference, February 13, 2007 Application: smart-rerun

Provenance 19Ptolemy Miniconference, February 13, 2007 Implementation of RWS in Kepler Data model i.e. observables in all MoC implementations in Kepler Port-actor relationship portTable(Port, Actor, type) type is a for atomic and c for composite actors (transparent) Token-object relationship tokenTable(Token, Object) Object-value relationship objectTable(Object, Value, Type) type is currently not recorded RWS trace traceTable(Port, Event, Token, FiringCounter) event: r as read, w as write or s as state-reset

Provenance 20Ptolemy Miniconference, February 13, 2007 Extending the framework 1. Initialization ( initialize() ) Framework traverses the workflow graph (ports and connections) RWS: generate specific data structures (port, actor and connection details) 2. Just before start ( validate() ) Framework subscribes for event listeners RWS: subscribe additional listener TokenGetEvent

Provenance 21Ptolemy Miniconference, February 13, 2007 Extending the framework 3. When workflow is modified ( changeExecuted() ) Framework traverses the workflow graph (ports and connections) RWS: re-generate data structures 4. During execution when an event occurs TokenSendEvent() and TokenGetEvent() listeners are extended to generate RWS trace events

Provenance 22Ptolemy Miniconference, February 13, 2007 Possible applications of Provenance Smart-rerun Monitoring/debugging of a workflow see LiDAR poster today by Efrat Jäger-Frank Answering processing history, data related question Participated at the First Provenance Challenge with Kepler-RWS Reporting/documentation of workflows and data products Generate my publication

Provenance 23Ptolemy Miniconference, February 13, 2007 Acknowledgement RWS model Shawn Bowers and Timothy McPhillips, UC Davis Formalization of the MoPs Bertram Ludäscher, UC Davis Kepler Provenance Framework implementation Oscar Barney, Univ. of Utah, Salt Lake City Efrat Jäger-Frank, SDSC, San Diego

Provenance 24Ptolemy Miniconference, February 13, 2007 References RWS model S.Bowers, T.McPhillips, B.Ludäscher, S.Cohen and S.B.Davidson A Model for User-Oriented Data Provenance in Pipelined Scientific Workflows Intl. Provenance and Annotation Workshop (IPAW), Chicago, 2006 B.Ludäscher, N.Podhorszki, I.Altintas, S.Bowers, T.McPhillips From Computation Models to Models of Provenance and the RWS Model to appear in 2007 in Journal of Concurrency and Computation: Practice and Experience Provenance framework I.Altintas, O.Barney, E.Jäger-Frank Provenance Collection Support in the Kepler Scientific Workflow System Intl. Provenance and Annotation Workshop (IPAW), Chicago, 2006