Presentation is loading. Please wait.

Presentation is loading. Please wait.

Under the Hood of a Workflow Manager Matthew Shields, BiodiversityWorld GRID workshop, NeSC, 30 June - 1 July T r a ai n.

Similar presentations


Presentation on theme: "Under the Hood of a Workflow Manager Matthew Shields, BiodiversityWorld GRID workshop, NeSC, 30 June - 1 July T r a ai n."— Presentation transcript:

1 Under the Hood of a Workflow Manager Matthew Shields, BiodiversityWorld GRID workshop, NeSC, 30 June - 1 July T r a ai n

2 Matthew Shields, Cardiff University Outline What is Workflow management? Why should I care? Current State of the Art Workflow Languages Other Projects Triana, Architecture & Services Extending Triana for BDWorld Conclusion

3 Matthew Shields, Cardiff University What is Workflow Management? Concept comes from business world Many years of research and practice Process capture and reuse Repeatability, provenance, audit trails & accountability Domain expert knowledge capture Analysis and optimization

4 Matthew Shields, Cardiff University What Can a Workflow Manager do for Me? Scientific Workflow different focus to business Large-scale data collection Querying Analysis Visualization Similar goals Component & workflow reuse Knowledge capture Additional goals Simplified application/experiment design Environment/Complexity abstraction

5 Matthew Shields, Cardiff University State of the Art Schedule workflow tasks (Grid/distributed environment) Monitor/Control execution Active visualization and computational steering User interaction Pause and restart Data provenance Component and sub-workflow reuse Analysis and optimization

6 Matthew Shields, Cardiff University Workflow Languages No current agreed standard Most projects use DAG or Petri-Net Data vs control flow Dependency vs scripting language Many XML schema Business workflow standards - BPEL Not good enough fit GGF WFM-RG Attempting to solicit agreement on standards

7 Matthew Shields, Cardiff University Workflow Management Projects myGrid/Taverna - Southampton & others XML/DAG based workflow language Initially WS choreography tool - now incorporates local tools/components Grid integration with databases via OGSA Distributed Query Processor myGrid Project main users - Bioinformatics Kepler - SDSC Based on Ptolemy - modeling, simulation & design of real time & concurrent systems Concurrent dataflow Actors (components), Directors (workflow engines) Local, Web Service & Grid Service actors Ecology, biology, chemistry, oceanography, and the geosciences

8 Matthew Shields, Cardiff University WM Projects 2 Karajan/Commodity Grid (CoG) Kit, Argonne & Berkerley Scripting workflow language for Grid tasks Integration with Globus Toolkit GT3 & GT4 Pure control flow Data flow performed by data tasks - GridFTP And many more… See http://www.gridworkflow.org/snips/gridworkflow/ http://www.extreme.indiana.edu/swf-survey/

9 Matthew Shields, Cardiff University Triana Cardiff University! PPARC funded Java based Scientific Workflow Tool or PSE Originally designed for Signal Processing Now domain independent Bioinformatics - obviously! Signal Processing - gravitational wave detection & radio astronomy Design optimisation Data mining Medical imaging Distributed Audio Processing

10 Matthew Shields, Cardiff University Triana Components Local Java components Service-oriented Components Web services as components (WSRF coming soon) Web service workflow Peer 2 Peer services as components Distributed service workflow Grid-oriented Components Grid file and job primitives as components Complex Grid workflow Legacy code components via GridMonSteer Mix and Match composition

11 Matthew Shields, Cardiff University Workflow Inherently data flow based control flow through messages XML/DCG workflow format Internally workflow language independent Migration to standards based language Simple Parent/Child relationship between tasks Context based implied actions Local file -> local file = file copy Local file -> remote file = file transfer Import/Export other workflow formats Pegasus/EGEE read/write DAGMan format

12 Matthew Shields, Cardiff University Triana Architecture P2PSJXTA Web Services GAP Interface UDDI SOAP P2PS Discovery P2PS Pipes JXTA Discovery JXTA Pipes GAT Interface Condor Globus RLS Unicore PBS GridLab GRMS SGESSH WSRF LDR.NET Other.. GridFTP Grid Computing: Job Submission, File services A Graphical Grid Computing Environment or Portal Service Based Computing: Deployment, discovery and communication with distributed services e.g. P2P and (GSI) Web services Grid services

13 Matthew Shields, Cardiff University Triana in a SO World network babelfish. altavista. com BabelFish en_fr hello bonjour Service Discovery Dynamic? Decentralized? Communication Message Format SOAP? Transport Protocol TCP? UDP? GAP

14 Matthew Shields, Cardiff University GAP Interface A Simple Service based API, for Service Deployment, Service Discovery Pipe Based Communication Static application interface with multiple middleware bindings P2PS JXTA Web services P2PSJXTA Web Services GAP Interface UDDI SOAP P2PS Discovery P2PS Pipes JXTA Discovery JXTA Pipes

15 Matthew Shields, Cardiff University WSPeer High Level Interface to Web Services Discovery Invocation Deployment Hosting Abstract from usual Web Service Discovery and Communication Mechanisms (i.e. UDDI and HTTP) P2PS Web Service Discovery? Uses Apache AXIS as SOAP Engine Extends Capabilities of Apache AXIS Stubless Invocation (including complex types) Non Standard Transports (i.e. P2PS)

16 Matthew Shields, Cardiff University WSPeer WSPeer – P2PS Application WSPeer – HTTP/UDDI deploypublishlocateinvoke UDDI HTTP Server deploy launch server publishlocate invoke deploy publish locate invoke

17 Matthew Shields, Cardiff University Extending Triana for BDWorld BDWorld proxy components talk to Web Services Workflow Design Assistant (WfDA) selection and composition of BDWorld workflows from available services Uses Meta Data Repository (MDR) & Meta Data Agent (MDA) MDR contains mapping from proxies to resources WfDA captures domain knowledge in constraints Constraints used to limit the possible components at each stage of composition Simplifies valid workflow creation

18 Matthew Shields, Cardiff University Conclusion A workflow manager should: Simplify scientific experimentation Enable reuse at multiple levels Component Sub-workflow/Compund components Collaboration Abstract component and environment complexities Think of all components as a service that performs a known task Implied/Context based operations - file copy/move Put the scientist back in control of the science, not the computing


Download ppt "Under the Hood of a Workflow Manager Matthew Shields, BiodiversityWorld GRID workshop, NeSC, 30 June - 1 July T r a ai n."

Similar presentations


Ads by Google