Presentation is loading. Please wait.

Presentation is loading. Please wait.

REDUX – automatic capture, efficient storage Roger S. Barga Microsoft Research (MSR) Luciano Digiampietri University of Campinas, Sao Paolo, Brazil.

Similar presentations


Presentation on theme: "REDUX – automatic capture, efficient storage Roger S. Barga Microsoft Research (MSR) Luciano Digiampietri University of Campinas, Sao Paolo, Brazil."— Presentation transcript:

1 REDUX – automatic capture, efficient storage Roger S. Barga Microsoft Research (MSR) Luciano Digiampietri University of Campinas, Sao Paolo, Brazil

2 What information needs to be captured? Which version of BLAST did I use? What codes (activities) did I invoke to get this result, and what were the parameters? What data transformations did I use to get this result? What machine was used to perform the alignment? Were any steps skipped in this experiment, or were any shims inserted? Did the experiment design differ between these two results? If so, where?... Are there any branches in the workflow that have not been explored? Additional Issues to Consider… Result of a provenance query is an executable workflow Provenance storage costs can quickly grow out of hand… Considerations Allow the user to control what is shared/exposed – one size doesn’t fit all It may not possible to rerun an experiment, to either validate or recreate a result because original workflow is lost (activities have been updated).

3 Implementation Extended enactment engine of WinOE to automatically capture steps during execution leading to a result Provenance capture is automatic & transparent Store provenance in a RDBMS (SQL Server), utilize previous traces to significantly reduce storage costs Current query interface is SQL, eventually a forms based interface. Version and lock the executables Updating any activity will change the workflow version number, resulting in a new version. User is able to rerun an experiment by invoking workflow using fully-specified reference found in the provenance record; A multilayer model for representing result provenance Abstract Workflow  Service Instantiation  Data Instantiation  Runtime

4 Abstract Workflow

5 Data Model for Abstract Workflow

6 Bound to Activities (code) and Data

7 Data Model for Workflow Instance

8 Provenance Queries – Query 1 Provenance queries 1, 4, 5, 7, 8 and 9 Find the process that led to Atlas X Graphic / everything that caused Atlas X Graphic to be as it is. This should tell us the new brain images from which the averaged atlas was generated, the warping performed etc. Returns ExecutableWorkflowId (process), ExecutionId (id of specific execution of the process), EventId (event where data was produced) and ExecutableWorkflow_ ExecutableActivityId (activity that produced the data) of the processes that generated the Atlas X Graphic

9 Provenance Queries – Query 7a Provenance queries 1, 4, 5, 7, 8 and 9 Our layered model allows the detection of differences in several ways A user has run the workflow twice, in the second instance replacing each procedures (convert) in the final stage with two procedures: pgmtoppm, then pnmtojpeg. Find the differences between the two workflow runs. The exact level of detail in the difference that is detected by a system is up to each participant.pgmtoppmpnmtojpeg

10 Provenance Queries – Query 7b A user has run the workflow twice, in the second instance replacing each procedures (convert) in the final stage with two procedures: pgmtoppm, then pnmtojpeg. Find the differences between the two workflow runs. The exact level of detail in the difference that is detected by a system is up to each participant.pgmtoppmpnmtojpeg Activities used by the second workflow but not the first Workflow Model captures information about the instances of the activities, and the links among the ports (or activities interfaces). At this layer, our model allows provenance queries to question, for example, what activities from Workflow 2 are not included in Workflow 1:

11 Provenance Queries – Query 7c A user has run the workflow twice, in the second instance replacing each procedures (convert) in the final stage with two procedures: pgmtoppm, then pnmtojpeg. Find the differences between the two workflow runs. The exact level of detail in the difference that is detected by a system is up to each participant.pgmtoppmpnmtojpeg Runtime Level which contains information about the execution of the workflow (produced data, timestamps, activities invoked, etc.). Here the model allows queries about produced data, data flow (See Q2 and Q3), date/time, etc. One example query that illustrates the difference between two workflows, at this level, is: What is the data produced by the second workflow that was not produced by the first? Data produced by workflow 2 that was not produced by workflow 1:

12 Efficiently Storing Provenance Data For Provenance Query 7 Two workflows are sharing more that 99% of the provenance data (space) and sharing 46% of the database tuples.

13 Extended Windows Workflow Foundation Transparently capture execution trace leading to a result A layered provenance model Relational database (SQL Server) as provenance store Store provenance as delta/edit over existing traces Initial query facility built over this provenance data Unique aspects of our system Result of a provenance query is an executable workflow Coupled code versioning to provenance collection An open (and interesting) data management challenge To Sum Up…

14


Download ppt "REDUX – automatic capture, efficient storage Roger S. Barga Microsoft Research (MSR) Luciano Digiampietri University of Campinas, Sao Paolo, Brazil."

Similar presentations


Ads by Google