Presentation is loading. Please wait.

Presentation is loading. Please wait.

Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin.

Similar presentations


Presentation on theme: "Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin."— Presentation transcript:

1 Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin Gautam Microsoft Research Joby Thomas and the development team Aditi Technologies

2 Satya Sahoo Wright State University David Koop University of Utah Matt Valerio Ohio State University Eran Chinthaka Indiana University MSR (Trident) Summer 09 Interns

3 Technical Content Introduction Feature Overview and Logical Architecture Deep(er) dive into select features with demos Roadmap to delivery Overview of our presentation today Design Philosophy and Exit Strategy Leverage COTS WFMS, build only what is required Extensible and open, integrate with community tools Drive development from actual eScience requirements Deliver as open source accelerator to the community

4 Workflow for Ocean Observatories, part of an oceanographers workbench Jim Gray Ocean Observing Initiative (OOI) Formerly the NEPTUNE project Collaboration with Univ. of Wash & MBARI

5

6 PanSTARRs (Astronomy) Workflow Requirements Load/Merge Databases Execute on Clusters Monitor workflow execution Logging, Provenance, Faults One of the largest visible light telescopes Four unit telescopes acting as one One Gigapixel per telescope Survey entire visible universe in 1 week Catalog solar system, moving objects/asteroids ps1sc.org: Univ. Hawaii, Johns Hopkins, …

7 Sanity Check of Network Files, Manifest, Checksum Validate CSV File & Table Schema Create, Register empty LoadDB from template For Each CSV File in Batch BULK LOAD CSV File into Table Start Perform CSV File/Table Validation Perform LoadDB/Batch Validation End Detect Load Fault. Launch Recovery Operations. Notify Admin. Determine affine Slice Cold DB for CSV Batch Switch OUT Slice partition to temp For Each Partition in Slice Cold DB UNION ALL over Slice & Load DBs into temp. Filter on partition bound. Start Post Partition Load Validation Switch IN temp to Slice partition End Detect Merge Fault. Launch Recovery Operations. Notify Admin. Slice Column Recalculations & Updates Post Slice Load Validation Determine Merge Worthy Load DBs & Slice Cold DBs Pan-STARRS Load & Merge Workflows

8 Trident Public Website Accessible today From January 09

9 Logical Architecture Features Building on Windows Workflow 9

10 Visualization Design Trident Logical Architecture Workflow Packages Management Studio Community Workbench Desktop Browser Windows Workflow Foundation Scientific Workflows Monitor Administration Web Portal (myExperiment) Archiving Trident Registry Data Model (Data Agnostic Abstraction) Data Access SQL ServerSSDSS3Others Registry Management Trident Runtime Services Provenance Publish-Subscribe Blackboard WF Execution Hosts Others Fault Tolerance HPC Scheduling

11 Trident Features Libraries of activities, services, and workflows – Prepackaged activities and workflows out of the box and custom libraries – Registry with rich sets of workflow meta data – Versions – Workflow packages – Social annotations (myExperiment)

12 Trident Features Two programming interfaces to Trident Use Visual Studio to develop custom activities and workflows and import them to Trident Visually Compose Workflows – No programming and scripting is required – Drag and drop a workflow or an activity – Subsections

13 Execution Service Local or distributed execution of workflows – HPCS cluster – Cloud services Interactive and non-interactive execution service Publishes events to subscriber services, such as tracking, provenance, and monitoring.

14 Workflow Monitoring Remote and local monitoring – Workflow processing status – Input and output parameters – Data products – Performance

15 Management Studio Administration of workflows and workflow scheduling Registry management Monitoring

16 What is Windows Workflow? Part of Microsofts.Net framework 3.0, 3.5, and upcoming 4.0 Activities Runtime Tooling Host Process (.exe, IIS, …) WF Runtime Extensions Tracking Persistence … … Workflow Activity Library Tooling VS Designer VS Debugger Rehosted Designer

17 Windows Workflow Base Activity Library Basic Composite

18 Workflow Authoring

19 Trident Workflow Composer An End User Application for Editing, Executing, and Monitoring Scientific Workflows 19

20 What Differentiates Scientific Workflow? Composition goes through many iterations Data flow is a first class citizen Need an easy way to publish and share Provenance Runtime Evolutionary Adaptable to different computing environments

21 Trident Workflow Composer Composition Space Activity Library Workflow Library Data Options & Sharing

22 Composer Demo 22

23 Trident Registry Flexible Data Store And Some More 23

24 Trident Registry Motivation: Why a new registry system? Single point of truth of the system – Facilitates state synchronization actions – Catalog keeps track of computing resources and state Flexible Storage – What is it? Flexible store mechanism Supports Microsoft and non-Microsoft store providers Supports local, client-server and cloud architectures – Non goals Replacement for LINQ or ER Framework Reference Catalog – Unified view of the resources – Stores references to internal and external resources – Flexible provider mechanism to abstract access to external resources

25 Trident Registry Registry Connections

26 Trident Registry Registry Management

27 Trident Registry Data Providers: Abstracting Whats out there Storage providers – Provides abstraction to data structures stored in the backend – No assumptions on how data was stored and related Implemented using verbs and subjects actions Store object user with these properties Relate this user object with this service as its owner Delete namespace object Data abstraction layer and code generation – C# generated code provides shield and programming API – C# code generator generates SQL catalog for perfect data code match

28 Trident Registry Data Providers: Abstracting Whats out there Creating new providers – Why would I create a new storage provider? Enable Trident to store / retrieve state from other platforms Enable Trident to store / retrieve state on other systems Enhance existing providers with new features and abstractions – What it takes to create a new provider Create a new assembly (or add to an existing provider assembly) Create a new class derived from Microsoft.Research.eResearch.Connection Drop our new DLL into Trident folder

29 Creating a new Registry Provider DEMO 29

30 Trident Registry Storage vs References Use Cases – Object Tracking – Data and Process Discovery All workflow aspects are exposed in the storage schema Allows rich query of data, activities, parameters, etc Data Providers – Abstraction layer to external references (similar to registry data storage) Enables user applications to benefit from unified model Simplifies development Enables fault tolerance for external resource sources Not every workflow need to worry about these details – All data provider knowledge resides in the registry – Pluggable and flexible

31 APIAPI Native Managed Web Services APIAPI Managed Native Web Services Trident Registry Provider API Managed (.NET) API – Library of choice for interacting with Trident Registry – Simplifies lots of data complexity – Abstracts verbs and actions into an object model – Access to all Trident Registry objects and relations – No need for servers and services to operate (access the data backend directly) – Faster, no extra hops. Direct data access. Native API – Useful for non-managed applications and systems integration – Similar to Managed (.NET) API in terms of performance and requirements – But more limited (not a 100% feature match right now) Web Services API – Recommended for non-Microsoft platform integration, e.g. Linux and Mac OS – Requires a IIS web server and service configured – Greater control over data and process, higher data security – Only core objects and relationships are exposed right now – Extra parsing and processing hop. Need to consider cluster and load and balancing solutions for high-performance scenarios

32 Trident Blackboard A Distributed Eventing Model For Workflow 32

33 The Workflow Runtime and Tracking Services WF workflows launch in a runtime context – Runtime thread controls WF related threads Execution thread Built-in services Custom services Built-in services track workflow execution – Workflow events – Individual activity events – Data updates

34 Trident Blackboard A distributed Pub/Sub model for workflow eventing Why? – Tracking information needs to be shared across compute nodes – Workflows are evolutionary and thus messengers require a pluggable interface – Large message volume means that the message broker needs to be light-weight and fast

35 The Blackboard Message Titled name/value pair collection – All values are strings – Title and names can resolve against an ontology StructureExample Collection Title value 1 value 2 value 3 name 1 name 2 name 3 WF Runtime Event Activity Started { GUID } NetCDF Reader 5 Type Job ID Activity ID Event Order

36 The Blackboard Message Titled name/value pair collection – All values are strings – Title and names can resolve against an ontology StructureExample Collection Title value 1 value 2 value 3 name 1 name 2 name 3 WF Runtime Event Activity Started { GUID } NetCDF Reader 5 Type Job ID Activity ID Event Order Publisher Subscriber Workflow Tracker Database Logging Provenance Store

37 Blackboard Architecture Trident Workflow Executor WF Runtime Services Publisher Blackboard Subscriber Publisher Interface Subscriber Interface Message Subscription Information Lightweight Message Queue

38 Blackboard Architecture Trident Workflow Executor WF Runtime Services Publisher Blackboard Subscriber Publisher Interface Message Subscription Information Lightweight Message Queue Message Rerouting Subscription Information Management Recovery Logic Message Routing Subscriber Interface Messages

39 Blackboard Architecture Trident Workflow Executor WF Runtime Services Publisher Blackboard Subscriber Message Subscription Information Lightweight Message Queue Message Rerouting Subscription Information Management Recovery Logic Subscription Information Routing Messages Subscription Information Publisher Interface Subscriber Interface

40 Blackboard Architecture Trident Workflow Executor WF Runtime Services Publisher Blackboard Subscriber Message Subscription Information Lightweight Message Queue Message Rerouting Subscription Information Management Recovery Logic Internal Technologies Messages Subscription Information Publisher Interface Subscriber Interface Windows Workflow (WF) Windows Communication Foundation (WCF)

41 Blackboard Architecture Trident Workflow Executor WF Runtime Services Tracking Blackboard File Writer Composer Publisher Interface Message Subscription Information Lightweight Message Queue Message Rerouting Subscription Information Management Recovery Logic Logging and Monitoring Example Subscriber Interface Messages Resources Registry WF Runtime Event Activity Started { GUID } NetCDF Reader 5 Type Job ID Activity ID Event Order Config File

42 Blackboard Demo 42

43 Trident Tips and Tricks 43

44 Interoperability Story Silverlight execution environment – Web frontend for management and execution – Allows non-Microsoft operating system to use and admister Trident Interface with other systems – Cove – myExperiment

45 Interface Trident Other Systems Integration with UW COVE system DEMO 45

46

47 Trident Tips and Tricks Productivity Tools – Database ready activities Simplifies development of database aware workflows Code generator improves development productivity – Data visualization and charting activities – Web Service ready activities Simplifies development of web service aware workflows Code generator improves development productivity

48 Trident Roadmap to Release 48

49 Sprint 1 Composer framework Registry Distributed execution service Sprint 2 Service and Tray Icon (run workflows locally and remotely) Workflow model Open and Save workflows with Workflow Model Subsections Intermediate results IFELSE Workflow over workflow Sprint 3 FOR-LOOP and Replicator Property Sheets for workflows and activities Monitoring (WF events, input & output parameters, performance) Data products (input and output) Blackboard Logging PanStarrs workflow support Sprint 4 Invoke Web Service and DB stored procedures Workflow packages Provenance (PanStarrs) Registry Manager Administration Console and workflow scheduling Remote monitoring Sprint 5 Silverlight based Composer Trident Portal ( myExperiment) Deployment topologies desktop and workgroup (same domain) Fault Tolerance Trident Road Map


Download ppt "Trident Scientific Workflow Workbench eScience08 Tutorial Nelson Araujo, Roger Barga, Dean Guo, Jared Jackson Yogesh Simmhan, Catharine van Ingen, Nitin."

Similar presentations


Ads by Google