Presentation is loading. Please wait.

Presentation is loading. Please wait.

Workflow Provenance Bill Howe.

Similar presentations


Presentation on theme: "Workflow Provenance Bill Howe."— Presentation transcript:

1 Workflow Provenance Bill Howe

2 Bill Howe, eScience Institute
Comparison Data Model Prog. Model Services GPL * Typing (maybe) Workflow dataflow typing, provenance, Pegasus-style resource mapping, task parallelism Relational Algebra Relations Select, Project, Join, Aggregate, … optimization, physical data independence, data parallelism MapReduce [(key,value)] Map, Reduce massive data parallelism, fault tolerance MS Dryad IQueryable, IEnumerable RA + Apply + Partitioning typing, massive data parallelism, fault tolerance MPI Arrays/ Matrices 70+ ops data parallelism, full control Bill Howe, eScience Institute 11/12/2018

3 Bill Howe, eScience Institute
What is Provenance? src: David Holland Bill Howe, eScience Institute 11/12/2018

4 Bill Howe, eScience Institute
Example src: David Holland Bill Howe, eScience Institute 11/12/2018

5 An Example 1. Agent messages are recorded as interactions,
either by the agents or by the agent platform PROVENANCE Store 2. Agents record the internal relationships between inputs and outputs, plus extra meaningful information. TU.1 Data Collection request OTM.1 Donor Data request HC.1 Patient Data request EHCR Hospital A EHCR Hospital B TU.2 Serology Test request OTM.2 Donor Data HC.2 Patient Data Transplant Unit Interface Agent OTM Donor Data Collector Agent TU.3 Brain Death Notification + report If actors are black boxes, these assertions are not very useful because we do not know dependencies between messages OTM.3 Serology test request TU.4 Decision request Test Lab. Interface Agent OTM.4 Serology test result + report TU.5 Decision + report Bill Howe, eScience Institute 11/12/2018

6 Bill Howe, eScience Institute
caused by response to contains parts of Patient Data Request HC.1 Hospital B HC.2 Which is the basis for donation decision D? caused by response to Data Collection Request TU.1 Donor OTM.1 based on Brain Death Notification TU.3 Donor Data OTM.2 Serology Test Result OTM.4 User X is logged in User Z User W is logged in User Y justified by Brain Death report TU.3 response to Decision Request TU.4 Donation Decision TU.5 caused by Decision report TU.5 justified by Author A authored by Author C Author B caused by Serology Test Request TU.2 response to Serology Test Request OTM.3 caused by justified by Serology report OTM.4 Bill Howe, eScience Institute 11/12/2018

7 Bill Howe, eScience Institute
Use cases Data Quality Audit Trail Replication Recipes Attribution Informational/Communication What else? Bill Howe, eScience Institute 11/12/2018

8 Bill Howe, eScience Institute
Research Questions Bill Howe, eScience Institute 11/12/2018

9 Bill Howe, eScience Institute
Provenance Taxonomy Bill Howe, eScience Institute 11/12/2018

10 Types of Provenance, Redux
Data Provenance Metadata + History of a Data Object Workflow Provenance Metadata + History of the workflow itself Source control Bill Howe, eScience Institute 11/12/2018

11 Bill Howe, eScience Institute
COMAD Collection-oriented Modeling and Design Susan Davidson, Upenn Workflows may exhibit assembly line semantics open and close interleaved “read scopes” and “write scopes” Bill Howe, eScience Institute 11/12/2018

12 Provenance Aware Storage System
David Holland, Harvard Bill Howe, eScience Institute 11/12/2018

13 Bill Howe, eScience Institute
PASS Architecture Prov. and Storage Layer Bill Howe, eScience Institute 11/12/2018

14 Bill Howe, eScience Institute
VisTrails demo Bill Howe, eScience Institute 11/12/2018

15 Other Provenance Systems
Pegasus/Wings ZOOM ES3 SDG Karma JP Mindswap Redux RWS NCSCI USC/ISI OPA VDL MyGrid Bill Howe, eScience Institute 11/12/2018

16 Open Provenance Challenge
2006, First: Compare Expressiveness of provenance systems 2007, Second: Interoperability and Exchange 2008, Third: Evaluation of the Open Provenance Model 2010, Fourth and Last to apply the Open Provenance Model to a broad end-to-end scenario, and demonstrate novel functionality that can only be achieved by the presence of an an interoperable solution for provenance Bill Howe, eScience Institute 11/12/2018

17 First Open Provenance Challenge
Bill Howe, eScience Institute 11/12/2018

18 Bill Howe, eScience Institute
Challenge Workflow Bill Howe, eScience Institute 11/12/2018

19 Bill Howe, eScience Institute
Challenge Queries Bill Howe, eScience Institute 11/12/2018

20 Bill Howe, eScience Institute
Challenge Queries (2) Bill Howe, eScience Institute 11/12/2018

21 Categorization of Provenance Systems
Execution Environment Representation Technology SQL, RDF, etc. Query Language Research Emphasis Execution, Recording, Storing, Querying Bill Howe, eScience Institute 11/12/2018

22 Bill Howe, eScience Institute
Categorization (2) Includes WF Representation Data Derivation vs. Causal Events “Nouns” or “Verbs” Annotations Time Naming Tracked Data, Granularity Files, collections, bytes, tuples Abstraction Mechanisms functions, etc. Bill Howe, eScience Institute 11/12/2018

23 Bill Howe, eScience Institute
Results Bill Howe, eScience Institute 11/12/2018

24 Bill Howe, eScience Institute
Results Bill Howe, eScience Institute 11/12/2018

25 Bill Howe, eScience Institute
Results Bill Howe, eScience Institute 11/12/2018


Download ppt "Workflow Provenance Bill Howe."

Similar presentations


Ads by Google