Workflow Provenance Bill Howe.

Slides:



Advertisements
Similar presentations
GRADD: Scientific Workflows. Scientific Workflow E. Science laboris Workflows are the new rock and roll of eScience Machinery for coordinating the execution.
Advertisements

Intelligent Technologies Module: Ontologies and their use in Information Systems Revision lecture Alex Poulovassilis November/December 2009.
Abstraction Layers Why do we need them? –Protection against change Where in the hourglass do we put them? –Computer Scientist perspective Expose low-level.
Feedback on OPM Yogesh Simmhan Microsoft Research Synthesis of pairwise conversations with: Roger Barga Satya Sahoo Microsoft Research Beth Plale Abhijit.
IPAW'08 – Salt Lake City, Utah, June 2008 Data lineage model for Taverna workflows with lightweight annotation requirements Paolo Missier, Khalid Belhajjame,
PROVENANCE FOR THE CLOUD (USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES(FAST `10)) Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard.
Provenance in Distr. Organ Transplant Management Applying Provenance in Distributed Organ Management Sergio Álvarez, Javier Vázquez-Salceda, Tamás Kifor,
PrIMe PrIMe : Provenance Incorporating Methodology Steve Munroe The EU Grid Provenance Project University of Southampton UK
Architecture Tutorial 1 Overview of Today’s Talks Provenance Data Structures Recording and Querying Provenance –Break (30 minutes) Distribution and Scalability.
1. Introduction OASIS Reference Model for Service Oriented Architecture 2. ECF 4.0 Architecture 2.1 Core vs. Profiles 2.2 Major Design Elements 2.3.
(Hadoop) Pig Dataflow Language B. Ramamurthy Based on Cloudera’s tutorials and Apache’s Pig Manual 6/27/2015.
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
PARALLEL DBMS VS MAP REDUCE “MapReduce and parallel DBMSs: friends or foes?” Stonebraker, Daniel Abadi, David J Dewitt et al.
Architecture Tutorial Security and privacy in provenance Simon Miles King’s College London.
Chapter 1 Overview of Databases and Transaction Processing.
Introduction To System Analysis and design
Architecture Tutorial Overview of Today’s Talks Provenance Data Structures Recording and Querying Provenance –Break (30 minutes) Distribution and Scalability.
Electronically Querying for the Provenance of Entities Simon Miles Provenance-Aware Service-Oriented Architectures.
Provenance-aware Storage Systems Kiran-Kumar Muniswamy-Reddy David A. Holland Uri Braun Margo Seltzer Harvard University.
Searching Provenance Shankar Pasupathy, Network Appliance PASS Workshop, Harvard October 2005.
Concept demo System dashboard. Overview Dashboard use case General implementation ideas Use of MULE integration platform Collection Aggregation/Factorization.
NOVA: CONTINUOUS PIG/HADOOP WORKFLOWS. storage & processing scalable file system e.g. HDFS distributed sorting & hashing e.g. Map-Reduce dataflow programming.
SOFTWARE DESIGN AND ARCHITECTURE LECTURE 21. Review ANALYSIS PHASE (OBJECT ORIENTED DESIGN) Functional Modeling – Use case Diagram Description.
Usage of `provenance’: A Tower of Babel Luc Moreau.
San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of.
Recording application executions enriched with domain semantics of computations and data Master of Science Thesis Michał Pelczar Krakow,
Architecture Tutorial 1 Overview of Today’s Talks Provenance Data Structures Recording and Querying Provenance –Break (30 minutes) Distribution and Scalability.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
FI-CORE Data Context Media Management Chapter Release 4.1 & Sprint Review.
Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters Hung-chih Yang(Yahoo!), Ali Dasdan(Yahoo!), Ruey-Lung Hsiao(UCLA), D. Stott Parker(UCLA)
SALSASALSASALSASALSA Design Pattern for Scientific Applications in DryadLINQ CTP DataCloud-SC11 Hui Li Yang Ruan, Yuduo Zhou Judy Qiu, Geoffrey Fox.
Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009.
Provenance Challenge Simon Miles, Mike Wilde, Ian Foster and Luc Moreau.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Provenance Challenge gLite Job Provenance.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
BPEL Business Process Engineering Language A technology used to build programs in SOA architecture.
MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06.
VisTrails Second Provenance Challenge Tommy Ellkvist David Koop Juliana Freire Joint work with: Erik Andersen, Steven P. Callahan, Emanuele Santos, Carlos.
Architecture Tutorial 1 Overview of Today’s Talks Provenance Data Structures Recording and Querying Provenance –Break (30 minutes) Distribution and Scalability.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Provenance in Distr. Organ Transplant Management EU PROVENANCE project: an open provenance architecture for distributed.
Collection and storage of provenance data Jakub Wach Master of Science Thesis Faculty of Electrical Engineering, Automatics, Computer Science and Electronics.
Chapter 1 Overview of Databases and Transaction Processing.
Working Group: Data Foundations and Terminology (Practical Policy Considerations) Reagan Moore.
Parallel Programming Models EECC 756 David D. McGann 18 May, 1999.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
October 2014 HYBRIS ARCHITECTURE & TECHNOLOGY 01 OVERVIEW.
CS 540 Database Management Systems
Miraj Kheni Authors: Toyotaro Suzumura, Koji Ueno
Middleware independent Information Service
Knowledge Management Systems
Component Based Software Engineering
Ninja Meeting 2/15/2000 Sam Madden
CH#3 Software Designing (Object Oriented Design)
Software Architecture & Design Pattern
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
NoSQL Systems Overview (as of November 2011).
湖南大学-信息科学与工程学院-计算机与科学系
Chapter 5 Designing the Architecture Shari L. Pfleeger Joanne M. Atlee
NSDL Data Repository (NDR)
Database Systems Instructor Name: Lecture-3.
IS-ENES Cases Seven use cases are listed as data lifecycle steps A B C
CS639: Data Management for Data Science
Chapter 22 Object-Oriented Systems Analysis and Design and UML
Reportnet 3.0 Database Feasibility Study – Approach
TN19-TCI: Integration and API management using TIBCO Cloud™ Integration
Presentation transcript:

Workflow Provenance Bill Howe

Bill Howe, eScience Institute Comparison Data Model Prog. Model Services GPL * Typing (maybe) Workflow dataflow typing, provenance, Pegasus-style resource mapping, task parallelism Relational Algebra Relations Select, Project, Join, Aggregate, … optimization, physical data independence, data parallelism MapReduce [(key,value)] Map, Reduce massive data parallelism, fault tolerance MS Dryad IQueryable, IEnumerable RA + Apply + Partitioning typing, massive data parallelism, fault tolerance MPI Arrays/ Matrices 70+ ops data parallelism, full control Bill Howe, eScience Institute 11/12/2018

Bill Howe, eScience Institute What is Provenance? src: David Holland Bill Howe, eScience Institute 11/12/2018

Bill Howe, eScience Institute Example src: David Holland Bill Howe, eScience Institute 11/12/2018

An Example 1. Agent messages are recorded as interactions, either by the agents or by the agent platform PROVENANCE Store 2. Agents record the internal relationships between inputs and outputs, plus extra meaningful information. TU.1 Data Collection request OTM.1 Donor Data request HC.1 Patient Data request EHCR Hospital A EHCR Hospital B TU.2 Serology Test request OTM.2 Donor Data HC.2 Patient Data Transplant Unit Interface Agent OTM Donor Data Collector Agent TU.3 Brain Death Notification + report If actors are black boxes, these assertions are not very useful because we do not know dependencies between messages OTM.3 Serology test request TU.4 Decision request Test Lab. Interface Agent OTM.4 Serology test result + report TU.5 Decision + report Bill Howe, eScience Institute 11/12/2018

Bill Howe, eScience Institute caused by response to contains parts of Patient Data Request HC.1 Hospital B HC.2 Which is the basis for donation decision D? caused by response to Data Collection Request TU.1 Donor OTM.1 based on Brain Death Notification TU.3 Donor Data OTM.2 Serology Test Result OTM.4 User X is logged in User Z User W is logged in User Y justified by Brain Death report TU.3 response to Decision Request TU.4 Donation Decision TU.5 caused by Decision report TU.5 justified by Author A authored by Author C Author B caused by Serology Test Request TU.2 response to Serology Test Request OTM.3 caused by justified by Serology report OTM.4 Bill Howe, eScience Institute 11/12/2018

Bill Howe, eScience Institute Use cases Data Quality Audit Trail Replication Recipes Attribution Informational/Communication What else? Bill Howe, eScience Institute 11/12/2018

Bill Howe, eScience Institute Research Questions Bill Howe, eScience Institute 11/12/2018

Bill Howe, eScience Institute Provenance Taxonomy Bill Howe, eScience Institute 11/12/2018

Types of Provenance, Redux Data Provenance Metadata + History of a Data Object Workflow Provenance Metadata + History of the workflow itself Source control Bill Howe, eScience Institute 11/12/2018

Bill Howe, eScience Institute COMAD Collection-oriented Modeling and Design Susan Davidson, Upenn Workflows may exhibit assembly line semantics open and close interleaved “read scopes” and “write scopes” Bill Howe, eScience Institute 11/12/2018

Provenance Aware Storage System David Holland, Harvard Bill Howe, eScience Institute 11/12/2018

Bill Howe, eScience Institute PASS Architecture Prov. and Storage Layer Bill Howe, eScience Institute 11/12/2018

Bill Howe, eScience Institute VisTrails demo Bill Howe, eScience Institute 11/12/2018

Other Provenance Systems Pegasus/Wings ZOOM ES3 SDG Karma JP Mindswap Redux RWS NCSCI USC/ISI OPA VDL MyGrid Bill Howe, eScience Institute 11/12/2018

Open Provenance Challenge 2006, First: Compare Expressiveness of provenance systems 2007, Second: Interoperability and Exchange 2008, Third: Evaluation of the Open Provenance Model 2010, Fourth and Last to apply the Open Provenance Model to a broad end-to-end scenario, and demonstrate novel functionality that can only be achieved by the presence of an an interoperable solution for provenance Bill Howe, eScience Institute 11/12/2018

First Open Provenance Challenge Bill Howe, eScience Institute 11/12/2018

Bill Howe, eScience Institute Challenge Workflow Bill Howe, eScience Institute 11/12/2018

Bill Howe, eScience Institute Challenge Queries Bill Howe, eScience Institute 11/12/2018

Bill Howe, eScience Institute Challenge Queries (2) Bill Howe, eScience Institute 11/12/2018

Categorization of Provenance Systems Execution Environment Representation Technology SQL, RDF, etc. Query Language Research Emphasis Execution, Recording, Storing, Querying Bill Howe, eScience Institute 11/12/2018

Bill Howe, eScience Institute Categorization (2) Includes WF Representation Data Derivation vs. Causal Events “Nouns” or “Verbs” Annotations Time Naming Tracked Data, Granularity Files, collections, bytes, tuples Abstraction Mechanisms functions, etc. Bill Howe, eScience Institute 11/12/2018

Bill Howe, eScience Institute Results Bill Howe, eScience Institute 11/12/2018

Bill Howe, eScience Institute Results Bill Howe, eScience Institute 11/12/2018

Bill Howe, eScience Institute Results Bill Howe, eScience Institute 11/12/2018