REDUX – automatic capture, efficient storage Roger S. Barga Microsoft Research (MSR) Luciano Digiampietri University of Campinas, Sao Paolo, Brazil.

Slides:



Advertisements
Similar presentations
Oracle Enterprise Manager Grid Control: Day in the Life of An Admin Wilson N. López – Solution Specialist.
Advertisements

GRADD: Scientific Workflows. Scientific Workflow E. Science laboris Workflows are the new rock and roll of eScience Machinery for coordinating the execution.
Layering in Provenance Systems Margo Seltzer May 13, 2009 Provenance in Secure and Advanced Computer Systems.
History Data Service1 Good Design for Historical source based Databases History Data Service Hamish James.
DIGIDOC A web based tool to Manage Documents. System Overview DigiDoc is a web-based customizable, integrated solution for Business Process Management.
Agenda Definitions Evolution of Programming Languages and Personal Computers The C Language.
Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
Provenance GGF18 Kepler/COW+RWS, Kepler/COW+RWS, Bowers, McPhiilips et al. Provenance Management in a COllection-oriented Scientific Workflow.
The SQL Language Presented by Reggie James, Isel Liunoras, and Chris Rollins.
Kashif Jalal CA-240 (072) Web Development Using ASP.NET CA – 240 Kashif Jalal Welcome to week – 2 of…
Inventory Management System With Berkeley DB 1. What is Berkeley DB? Berkeley DB is an Open Source embedded database library that provides scalable, high-
1 Case Study: Starting the Student Registration System Chapter 3.
JokerStars: Online Card Playing William Sanville Milestone 4.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
Chapter 7 Managing Data Sources. ASP.NET 2.0, Third Edition2.
Working with SQL and PL/SQL/ Session 1 / 1 of 27 SQL Server Architecture.
The POSTGRES Next - Generation Database Management System Michael Stonebraker Greg Kemnitz Presented by: Nirav S. Sheth.
- Chaitanya Krishna Pappala Enterprise Architect- a tool for Business process modelling.
CHAPTER 9 DATABASE MANAGEMENT © Prepared By: Razif Razali.
SSIS Over DTS Sagayaraj Putti (139460). 5 September What is DTS?  Data Transformation Services (DTS)  DTS is a set of objects and utilities that.
39 Copyright © 2007, Oracle. All rights reserved. Module 39: Siebel Task UI Siebel 8.0 Essentials.
Database Systems Group Department for Mathematics and Computer Science Lars Hamann, Martin Gogolla, Mirco Kuhlmann OCL-based Runtime Monitoring of JVM.
INFS 752 Summer Juan Salazar Please right click the symbol in the lower right corner, and then press preview, to hear the presentation for each page.
Data Warehousing Seminar Chapter 5. Data Warehouse Design Methodology Data Warehousing Lab. HyeYoung Cho.
Zhonghua Qu and Ovidiu Daescu December 24, 2009 University of Texas at Dallas.
ControlDraw, Modularisation, Standards And Re-Use Standardised Specification and Modular Design How ControlDraw Help.
COMPUTER PROGRAMMING Source: Computing Concepts (the I-series) by Haag, Cummings, and Rhea, McGraw-Hill/Irwin, 2002.
Dynamic Data Exchanges with the Java Flow Processor Presenter: Scott Bowers Date: April 25, 2007.
1 PHP and MySQL. 2 Topics  Querying Data with PHP  User-Driven Querying  Writing Data with PHP and MySQL PHP and MySQL.
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
Exploring Microsoft Access Chapter 4 Relational Databases, External Data, Charts, and the Switchboard.
Eurotrace Hands-On The Eurotrace File System. 2 The Eurotrace file system Under MS ACCESS EUROTRACE generates several different files when you create.
Informix IDS Administration with the New Server Studio 4.0 By Lester Knutsen My experience with the beta of Server Studio and the new Informix database.
Configuration Management (CM)
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
PowerBuilder Online Courses - by Prasad Bodepudi
Automatic Generation of Workflow Execution Provenance Roger S. Barga Database Group, Microsoft Research (MSR)
IT 456 Seminar 5 Dr Jeffrey A Robinson. Overview of Course Week 1 – Introduction Week 2 – Installation of SQL and management Tools Week 3 - Creating and.
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
Lecture2: Database Environment Prepared by L. Nouf Almujally 1 Ref. Chapter2 Lecture2.
A Logic Programming Approach to Scientific Workflow Provenance Querying* Shiyong Lu Department of Computer Science Wayne State University, Detroit, MI.
Provenance Challenge Simon Miles, Mike Wilde, Ian Foster and Luc Moreau.
Office Business Applications Workshop Defining Business Process and Workflows.
© ABB University - 1 Revision C E x t e n d e d A u t o m a t i o n S y s t e m x A Chapter 4 Engineering Workplace Course T314.
Capabilities of Software. Object Linking & Embedding (OLE) OLE allows information to be shared between different programs For example, a spreadsheet created.
BIS Database Systems School of Management, Business Information Systems, Assumption University A.Thanop Somprasong Chapter # 8 Advanced SQL.
ICCS WSES BOF Discussion. Possible Topics Scientific workflows and Grid infrastructure Utilization of computing resources in scientific workflows; Virtual.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
The Problems HTTP is disconnected So many database vendors Create a simple consistent versatile interface on the data Look at ADO.NET classes OleDb SQL.
GAM666 – Introduction To Game Programming ● DirectX is implemented as a collection of COM objects ● To use a DirectX program, the user must have the correct.
CISC Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.
Khalid Belhajjame 1, Paolo Missier 2, and Carole A. Goble 1 1 University of Manchester 2 University of Newcastle Detecting Duplicate Records in Scientific.
IT System Administration Lesson 3 Dr Jeffrey A Robinson.
What is MySQL? MySQL is a relational database management system (RDBMS) based on SQL (Structured Query Language). First released in January, Many.
Learningcomputer.com SQL Server 2008 –Views, Functions and Stored Procedures.
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
VisTrails Second Provenance Challenge Tommy Ellkvist David Koop Juliana Freire Joint work with: Erik Andersen, Steven P. Callahan, Emanuele Santos, Carlos.
Module Road Map The Scope of the Problem A range of potential problems Lost Updates User A reads a record User B reads the same record User A makes changes.
SQL Query Analyzer. Graphical tool that allows you to:  Create queries and other SQL scripts and execute them against SQL Server databases. (Query window)
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
 What is DB Testing ?  Testing at the Data Access Layer  Need for Testing DB Objects  Common Problems that affect the Application  Should Testers.
Windows Workflow Foundation Guy Burstein Senior Consultant Advantech – Microsoft Division
Text2PTO: Modernizing Patent Application Filing A Proposal for Submitting Text Applications to the USPTO.
The Ingredients of Android Applications. A simple application in a process In a classical programming environment, the OS would load the program code.
Agenda for Today  DATABASE Definition What is DBMS? Types Of Database Most Popular Primary Database  SQL Definition What is SQL Server? Versions Of SQL.
IST 220 – Intro to Databases
ODBC, OCCI and JDBC overview
Instructor: Jason Carter
SQL Server Monitoring Overview
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Presentation transcript:

REDUX – automatic capture, efficient storage Roger S. Barga Microsoft Research (MSR) Luciano Digiampietri University of Campinas, Sao Paolo, Brazil

What information needs to be captured? Which version of BLAST did I use? What codes (activities) did I invoke to get this result, and what were the parameters? What data transformations did I use to get this result? What machine was used to perform the alignment? Were any steps skipped in this experiment, or were any shims inserted? Did the experiment design differ between these two results? If so, where?... Are there any branches in the workflow that have not been explored? Additional Issues to Consider… Result of a provenance query is an executable workflow Provenance storage costs can quickly grow out of hand… Considerations Allow the user to control what is shared/exposed – one size doesn’t fit all It may not possible to rerun an experiment, to either validate or recreate a result because original workflow is lost (activities have been updated).

Implementation Extended enactment engine of WinOE to automatically capture steps during execution leading to a result Provenance capture is automatic & transparent Store provenance in a RDBMS (SQL Server), utilize previous traces to significantly reduce storage costs Current query interface is SQL, eventually a forms based interface. Version and lock the executables Updating any activity will change the workflow version number, resulting in a new version. User is able to rerun an experiment by invoking workflow using fully-specified reference found in the provenance record; A multilayer model for representing result provenance Abstract Workflow  Service Instantiation  Data Instantiation  Runtime

Abstract Workflow

Data Model for Abstract Workflow

Bound to Activities (code) and Data

Data Model for Workflow Instance

Provenance Queries – Query 1 Provenance queries 1, 4, 5, 7, 8 and 9 Find the process that led to Atlas X Graphic / everything that caused Atlas X Graphic to be as it is. This should tell us the new brain images from which the averaged atlas was generated, the warping performed etc. Returns ExecutableWorkflowId (process), ExecutionId (id of specific execution of the process), EventId (event where data was produced) and ExecutableWorkflow_ ExecutableActivityId (activity that produced the data) of the processes that generated the Atlas X Graphic

Provenance Queries – Query 7a Provenance queries 1, 4, 5, 7, 8 and 9 Our layered model allows the detection of differences in several ways A user has run the workflow twice, in the second instance replacing each procedures (convert) in the final stage with two procedures: pgmtoppm, then pnmtojpeg. Find the differences between the two workflow runs. The exact level of detail in the difference that is detected by a system is up to each participant.pgmtoppmpnmtojpeg

Provenance Queries – Query 7b A user has run the workflow twice, in the second instance replacing each procedures (convert) in the final stage with two procedures: pgmtoppm, then pnmtojpeg. Find the differences between the two workflow runs. The exact level of detail in the difference that is detected by a system is up to each participant.pgmtoppmpnmtojpeg Activities used by the second workflow but not the first Workflow Model captures information about the instances of the activities, and the links among the ports (or activities interfaces). At this layer, our model allows provenance queries to question, for example, what activities from Workflow 2 are not included in Workflow 1:

Provenance Queries – Query 7c A user has run the workflow twice, in the second instance replacing each procedures (convert) in the final stage with two procedures: pgmtoppm, then pnmtojpeg. Find the differences between the two workflow runs. The exact level of detail in the difference that is detected by a system is up to each participant.pgmtoppmpnmtojpeg Runtime Level which contains information about the execution of the workflow (produced data, timestamps, activities invoked, etc.). Here the model allows queries about produced data, data flow (See Q2 and Q3), date/time, etc. One example query that illustrates the difference between two workflows, at this level, is: What is the data produced by the second workflow that was not produced by the first? Data produced by workflow 2 that was not produced by workflow 1:

Efficiently Storing Provenance Data For Provenance Query 7 Two workflows are sharing more that 99% of the provenance data (space) and sharing 46% of the database tuples.

Extended Windows Workflow Foundation Transparently capture execution trace leading to a result A layered provenance model Relational database (SQL Server) as provenance store Store provenance as delta/edit over existing traces Initial query facility built over this provenance data Unique aspects of our system Result of a provenance query is an executable workflow Coupled code versioning to provenance collection An open (and interesting) data management challenge To Sum Up…