Trident Scientific Workflow Workbench eScience’08 Tutorial

Slides:



Advertisements
Similar presentations
Implementing Tableau Server in an Enterprise Environment
Advertisements

GRADD: Scientific Workflows. Scientific Workflow E. Science laboris Workflows are the new rock and roll of eScience Machinery for coordinating the execution.
Trident Scientific Workflow Workbench Nelson Araujo, Roger Barga, Tim Chou, Dean Guo, Jared Jackson, Nitin Gautam, Yogesh Simmhan, Catharine Van Ingen.
LEAD Portal: a TeraGrid Gateway and Application Service Architecture Marcus Christie and Suresh Marru Indiana University LEAD Project (
DIGIDOC A web based tool to Manage Documents. System Overview DigiDoc is a web-based customizable, integrated solution for Business Process Management.
Oracle SQL Developer Data Modeler 3.0: Technical Overview March 2011.
Windows Server ® 2008 File Services Infrastructure Planning and Design Published: June 2010 Updated: November 2011.
New Release Announcements and Product Roadmap Chris DiPierro, Director of Software Development April 9-11, 2014
Microsoft SharePoint 2010 technology for Developers
Designing, Deploying and Managing Workflow in SharePoint Sites Steve Heaney Product Development Manager OBS
Building the Trident Scientific Workflow Workbench for Data Management in the Cloud Roger Barga, MSR Yogesh Simmhan, Ed Lazowska, Alex Szalay, and Catharine.
Microsoft® SharePoint™ Products And Technologies “v2.0” Overview Brian Murphy Product Planner Microsoft Corporation.
1 G2 and ActiveSheets Paul Roe QUT Yes Australia!
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Introduction to Windows Workflow Foundation (WF) Keith Elder Microsoft MVP Blog: Quicken Loans –
Originally founded in 1985 as Rock Financial by Dan Gilbert Grew to one of the largest independent mortgage banks in the country 1998 IPO 1999 Launched.
27. to 28. March 2007 | Geneva, Switzerland. Fabrice Romelard ilem SA Level 200.
Microsoft SharePoint 2013 SharePoint 2013 as a Developer Platform
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
The Client/Server Database Environment
Sharing Geographic Content
Windows.Net Programming Series Preview. Course Schedule CourseDate Microsoft.Net Fundamentals 01/13/2014 Microsoft Windows/Web Fundamentals 01/20/2014.
“This presentation is for informational purposes only and may not be incorporated into a contract or agreement.”
Creating Business Workflow Using SharePoint Designer 2007 Presented by Tarek Ghazali IT Technical Specialist Microsoft SQL Server MVP Microsoft SQL Server.
A Scalable Application Architecture for composing News Portals on the Internet Serpil TOK, Zeki BAYRAM. Eastern MediterraneanUniversity Famagusta Famagusta.
Trimble Connected Community
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
Christopher Jeffers August 2012
Meir Botner David Ben-David. Project Goal Build a messenger that allows a customer to communicate with a service provider for a fee.
Native Support for Web Services  Native Web services access  Enables cross platform interoperability  Reduces middle-tier dependency (no IIS)  Simplifies.
Introducing Reporting Services for SQL Server 2005.
HA-OSCAR Chuka Okoye Himanshu Chhetri. What is HA-OSCAR? “High Availability Open Source Cluster Application Resources”
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
CS480 Computer Science Seminar Introduction to Microsoft Solutions Framework (MSF)
Microsoft SharePoint Server 2010 for the Microsoft ASP.NET Developer Yaroslav Pentsarskyy
National Center for Supercomputing Applications NCSA OPIE Presentation November 2000.
Esri UC 2014 | Technical Workshop | Esri Roads and Highways: Integrating and Developing LRS Business Systems Tom Hill.
James Akrigg Microsoft Ltd Integrating InfoPath Forms Into Workflow Solutions And Business Processes.
Office Business Applications Workshop Defining Business Process and Workflows.
Microsoft Virtual Academy. STANDARDIZATION SELF SERVICEAUTOMATION Give Customers of IT services the ability to identify, access and request services.
DEV14 – Building Business Dashboards: Excel Services, KPIs and Report Centers Darwin Schweitzer Enterprise Technology Strategist
All information's of PLINQO in this Document, I got it from: So, you could visit the link above to research.
A Technical Overview Bill Branan DuraCloud Technical Lead.
Module 1: Introduction to Microsoft SQL Server Reporting Services
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Technology Drill Down: Windows Azure Platform Eric Nelson | ISV Application Architect | Microsoft UK |
CMPE 226 Database Systems April 19 Class Meeting Department of Computer Engineering San Jose State University Spring 2016 Instructor: Ron Mak
Integrating and Extending Workflow 8 AA301 Carl Sykes Ed Heaney.
Copyright © New Signature Who we are: Focused on consistently delivering great customer experiences. What we do: We help you transform your business.
The best of WF 4.0 and AppFabric Damir Dobric MVP-Connected System Developer Microsoft Connected System Division Advisor Visual Studio Inner Circle member.
Introduction ITEC 420.
Introducing the Microsoft® .NET Framework
How to be a SharePoint Developer
Introducing the Windows Mobile development
Deploying Web Application
The Client/Server Database Environment
Beyond the BDC\BCS Model
Business Connectivity Services in SharePoint 2010 and Office 2010
Building Applications with Windows Azure and SQL Azure
Entity Framework By: Casey Griffin.
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
DAT381 Team Development with SQL Server 2005
Saranya Sriram Developer Evangelist | Microsoft
Serpil TOK, Zeki BAYRAM. Eastern MediterraneanUniversity Famagusta
Technical Capabilities
Developing for Windows Azure
敦群數位科技有限公司(vanGene Digital Inc.) 游家德(Jade Yu.)
TN19-TCI: Integration and API management using TIBCO Cloud™ Integration
MS Confidential : SharePoint 2010 Developer Workshop (Beta1)
Presentation transcript:

Trident Scientific Workflow Workbench eScience’08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin Gautam Microsoft Research Joby Thomas and the development team Aditi Technologies

Wright State University MSR (Trident) Summer ‘09 Interns Eran Chinthaka Indiana University David Koop University of Utah Satya Sahoo Wright State University Matt Valerio Ohio State University

Overview of our presentation today Technical Content Introduction Feature Overview and Logical Architecture Deep(er) dive into select features with demos Roadmap to delivery Design Philosophy and Exit Strategy Leverage COTS WFMS, build only what is required Extensible and open, integrate with community tools Drive development from actual eScience requirements Deliver as open source accelerator to the community

Ocean Observing Initiative (OOI) Formerly the NEPTUNE project Workflow for Ocean Observatories, part of an “oceanographer’s workbench” Jim Gray Collaboration with Univ. of Wash & MBARI

PanSTARRs Workflow Requirements (Astronomy) One of the largest visible light telescopes Four unit telescopes acting as one One Gigapixel per telescope Survey entire visible universe in 1 week Catalog solar system, moving objects/asteroids ps1sc.org: Univ. Hawaii, Johns Hopkins, … Workflow Requirements Load/Merge Databases Execute on Clusters Monitor workflow execution Logging, Provenance, Faults

Pan-STARRS Load & Merge Workflows Sanity Check of Network Files, Manifest, Checksum Validate CSV File & Table Schema Create, Register empty LoadDB from template For Each CSV File in Batch BULK LOAD CSV File into Table Start Perform CSV File/Table Validation Perform LoadDB/Batch Validation End Detect Load Fault. Launch Recovery Operations. Notify Admin. Determine affine Slice Cold DB for CSV Batch Switch OUT Slice partition to temp For Each Partition in Slice Cold DB UNION ALL over Slice & Load DBs into temp. Filter on partition bound. Start Post Partition Load Validation Switch IN temp to Slice partition End Detect Merge Fault. Launch Recovery Operations. Notify Admin. Slice Column Recalculations & Updates Post Slice Load Validation Determine ‘Merge Worthy’ Load DBs & Slice Cold DBs

Trident Public Website Accessible today http://beta.research.microsoft.com/en-us/collaboration/tools/trident.aspx From January ‘09 http://research.microsoft.com/en-us/collaboration/tools/trident.aspx

Logical Architecture Features Building on Windows Workflow

Trident Logical Architecture Visualization Design Workflow Packages Management Studio Community Workbench Monitor Web Portal (myExperiment) Scientific Workflows Administration Archiving Desktop Windows Workflow Foundation Registry Management Browser Trident Runtime Services Publish-Subscribe Blackboard WF Execution Hosts Fault Tolerance Provenance HPC Scheduling Others Trident Registry Data Model (Data Agnostic Abstraction) Data Access SQL Server SSDS S3 Others

Trident Features Libraries of activities, services, and workflows Prepackaged activities and workflows out of the box and custom libraries Registry with rich sets of workflow meta data Versions Workflow packages Social annotations (myExperiment)

Trident Features Two programming interfaces to Trident Use Visual Studio to develop custom activities and workflows and import them to Trident Visually Compose Workflows No programming and scripting is required Drag and drop a workflow or an activity Subsections

Execution Service Local or distributed execution of workflows HPCS cluster Cloud services Interactive and non-interactive execution service Publishes events to subscriber services, such as tracking, provenance, and monitoring.

Workflow Monitoring Remote and local monitoring Workflow processing status Input and output parameters Data products Performance

Management Studio Administration of workflows and workflow scheduling Registry management Monitoring

What is Windows Workflow? Part of Microsoft’s .Net framework 3.0, 3.5, and upcoming 4.0 Activities Runtime Tooling Host Process (.exe, IIS, …) WF Runtime Extensions Tracking Persistence … Workflow Activity Library Tooling VS Designer VS Debugger Rehosted Designer

Windows Workflow Base Activity Library Composite Basic

Workflow Authoring

Trident Workflow Composer An End User Application for Editing, Executing, and Monitoring Scientific Workflows

What Differentiates Scientific Workflow? Composition goes through many iterations Data flow is a first class citizen Need an easy way to publish and share Provenance Runtime Evolutionary Adaptable to different computing environments

Trident Workflow Composer Data Options & Sharing Workflow Library Composition Space Activity Library

Composer Demo

Flexible Data Store And Some More Trident Registry Flexible Data Store And Some More

Trident Registry Motivation: Why a new registry system? Single “point of truth” of the system Facilitates state synchronization actions Catalog keeps track of computing resources and state Flexible Storage What is it? Flexible store mechanism Supports Microsoft and non-Microsoft store providers Supports local, client-server and cloud architectures Non goals Replacement for LINQ or ER Framework Reference Catalog Unified view of the resources Stores references to internal and external resources Flexible provider mechanism to abstract access to external resources

Trident Registry Registry Connections

Trident Registry Registry Management

Trident Registry Data Providers: Abstracting “What’s out there” Storage providers Provides abstraction to data structures stored in the backend No assumptions on how data was stored and related Implemented using “verbs” and “subjects” actions “Store object user with these properties” “Relate this user object with this service as its owner” “Delete namespace object” Data abstraction layer and code generation C# generated code provides shield and programming API C# code generator generates SQL catalog for perfect datacode match

Trident Registry Data Providers: Abstracting “What’s out there” Creating new providers Why would I create a new storage provider? Enable Trident to store / retrieve state from other platforms Enable Trident to store / retrieve state on other systems Enhance existing providers with new features and abstractions What it takes to create a new provider Create a new assembly (or add to an existing provider assembly) Create a new class derived from Microsoft.Research.eResearch.Connection Drop our new DLL into Trident folder

Creating a new Registry Provider DEMO

Trident Registry Storage vs References Use Cases Object Tracking Data and Process Discovery All workflow aspects are exposed in the storage schema Allows rich query of data, activities, parameters, etc Data Providers Abstraction layer to external references (similar to registry data storage) Enables user applications to benefit from unified model Simplifies development Enables fault tolerance for external resource sources Not every workflow need to worry about these details All data provider knowledge resides in the registry Pluggable and flexible

Trident Registry Provider API Managed (.NET) API Library of choice for interacting with Trident Registry Simplifies lots of data complexity Abstracts verbs and actions into an object model Access to all Trident Registry objects and relations No need for servers and services to operate (access the data backend directly) Faster, no extra hops. Direct data access. API Native Managed Web Services Managed API Native API Useful for non-managed applications and systems integration Similar to Managed (.NET) API in terms of performance and requirements But more limited (not a 100% feature match right now) Native Web Services API Recommended for non-Microsoft platform integration, e.g. Linux and Mac OS Requires a IIS web server and service configured Greater control over data and process, higher data security Only core objects and relationships are exposed right now Extra parsing and processing hop. Need to consider cluster and load and balancing solutions for high-performance scenarios Web Services

A Distributed Eventing Model For Workflow Trident Blackboard A Distributed Eventing Model For Workflow

The Workflow Runtime and Tracking Services WF workflows launch in a runtime context Runtime thread controls WF related threads Execution thread Built-in services Custom services Built-in services track workflow execution Workflow events Individual activity events Data updates

Trident Blackboard A distributed Pub/Sub model for workflow eventing Why? Tracking information needs to be shared across compute nodes Workflows are evolutionary and thus messengers require a pluggable interface Large message volume means that the message broker needs to be light-weight and fast

The Blackboard Message Titled name/value pair collection All values are strings Title and names can resolve against an ontology Structure Example ‘Collection Title’ ‘WF Runtime Event’ ‘name 1’ ‘name 2’ ‘name 3’ ‘value 1’ ‘value 2’ ‘value 3’ ‘Type’ ‘Job ID’ ‘Activity ID’ ‘Event Order’ ‘Activity Started’ ‘{ GUID }’ ‘NetCDF Reader’ ‘5’

The Blackboard Message Titled name/value pair collection All values are strings Title and names can resolve against an ontology Structure Example ‘Collection Title’ ‘WF Runtime Event’ ‘name 1’ ‘name 2’ ‘name 3’ ‘value 1’ ‘value 2’ ‘value 3’ ‘Type’ ‘Job ID’ ‘Activity ID’ ‘Event Order’ ‘Activity Started’ ‘{ GUID }’ ‘NetCDF Reader’ ‘5’ Publisher Workflow Tracker Subscriber Subscriber Database Logging Provenance Store

Blackboard Architecture Publisher Interface Subscriber Interface Trident Workflow Executor WF Runtime Services Publisher Blackboard Subscriber Publisher Subscriber Publisher Subscriber Message Subscription Information Lightweight Message Queue

Blackboard Architecture Message Routing Publisher Interface Message Rerouting Subscription Information Management Recovery Logic Subscriber Interface Trident Workflow Executor WF Runtime Services Publisher Blackboard Subscriber Messages Publisher Subscriber Publisher Subscriber Message Subscription Information Lightweight Message Queue

Blackboard Architecture Subscription Information Routing Publisher Interface Message Rerouting Subscription Information Management Recovery Logic Subscriber Interface Trident Workflow Executor WF Runtime Services Publisher Blackboard Subscriber Messages Publisher Subscriber Publisher Subscriber Subscription Information Message Subscription Information Lightweight Message Queue

Blackboard Architecture Internal Technologies Publisher Interface Message Rerouting Subscription Information Management Recovery Logic Subscriber Interface Trident Workflow Executor WF Runtime Services Publisher Blackboard Subscriber Messages Publisher Subscriber Publisher Subscriber Subscription Information Message Subscription Information Lightweight Message Queue Windows Workflow (WF) Windows Communication Foundation (WCF)

Blackboard Architecture Logging and Monitoring Example Publisher Interface Message Rerouting Subscription Information Management Recovery Logic Subscriber Interface Trident Workflow Executor WF Runtime Services Config File Tracking Blackboard File Writer Messages Composer Registry Resources ‘WF Runtime Event’ ‘Type’ ‘Job ID’ ‘Activity ID’ ‘Event Order’ ‘Activity Started’ ‘{ GUID }’ ‘NetCDF Reader’ ‘5’ Message Subscription Information Lightweight Message Queue

Blackboard Demo

Trident Tips and Tricks

Interoperability Story Silverlight execution environment Web frontend for management and execution Allows non-Microsoft operating system to use and admister Trident Interface with other systems Cove myExperiment

Interface Trident  Other Systems Integration with UW COVE system DEMO

Trident Tips and Tricks Productivity Tools Database ready activities Simplifies development of database aware workflows Code generator improves development productivity Data visualization and charting activities Web Service ready activities Simplifies development of web service aware workflows

Trident Roadmap to Release

Trident Road Map Sprint 1 Sprint 2 Sprint 3 Sprint 4 Sprint 5 Composer framework Registry Distributed execution service Sprint 2 Service and Tray Icon (run workflows locally and remotely) Workflow model Open and Save workflows with Workflow Model Subsections Intermediate results IFELSE Workflow over workflow Sprint 3 FOR-LOOP and Replicator Property Sheets for workflows and activities Monitoring (WF events, input & output parameters, performance) Data products (input and output) Blackboard Logging PanStarrs workflow support Sprint 4 Invoke Web Service and DB stored procedures Workflow packages Provenance (PanStarrs) Registry Manager Administration Console and workflow scheduling Remote monitoring Sprint 5 Silverlight based Composer Trident Portal (myExperiment) Deployment topologies desktop and workgroup (same domain) Fault Tolerance