Presentation is loading. Please wait.

Presentation is loading. Please wait.

The design and implementation of a workflow analysis tool Vasa Curcin Department of Computing Imperial College London.

Similar presentations


Presentation on theme: "The design and implementation of a workflow analysis tool Vasa Curcin Department of Computing Imperial College London."— Presentation transcript:

1 The design and implementation of a workflow analysis tool Vasa Curcin Department of Computing Imperial College London

2 Scientific workflow field Scientific workflows: a high-level programming language with explicit graphical representation of flow of data and/or control Research into automation of processes supporting scientific research Significant role in providing middleware for UK eScience programme: Taverna, Discovery Net, Triana Lingua franca of service-oriented computing

3 Deluge of workflows Meandre TavernaDiscovery Net Triana KeplerKNIMEOrangePentahoPegasusTrident YAWL BPELLONIGenePatterns Galaxy VisTrails UGENE Wildfire BioinformaticsCheminformaticsEnvironmental ScienceBusiness IntelligenceAstronomySensor informatics …

4 Workflow analysis There is a need for formal models to capitalize on the benefits of this infrastructure o Work evaluated on Discovery Net workflow o Concepts applicable to other workflow systems Some aims o Minimise cost of data movement and processing o Provide technology for workflow clients and warehouses (indexing, guided construction…) Tasks o Safeness o Instance bounds o Static workflow optimization o Establishing polymorphic type profiles of workflows

5 Underlying models Control flow model o Process calculus definitions o Communication along named channels Fixed for atomic execution, dynamic for streaming o New instance of the process launched as soon as the node receives a token o Computational tree logic modelling execution states Data flow model o Nodes associated with lambda calculus formulas and term graphs o Polymorphic type transformations o Rewrite rules defined for sets of nodes as term graph transformations Embedding o Way of combining the control and data semantics

6 Workflow analysis tool Similarity checker o Bisimilarity of processes Process profiler o Deadlock/livelock detection o Reachability o Task bounds Composability checker o Design-time tests o Type requirements o Polymorphic properties Equivalence checker o Functional equivalence Optimizer o Rewrite rules for transformations

7 Similarity checker Based purely on the pi-calculus process model o Workflows translated into the process model o Parallel composition of independent node processes with named channels o Compared in terms of: Internal executions (node actions) Set of observable outputs - define only relevant outputs Model checker used to test different types of bisimilarity o Node executions conveniently represented as silent actions o Strong bisimulation becomes strict one-to-one workflow action mapping o Weak bisimulation ignores internal actions and communications and focuses on visible outputs Workflow Process model Model checker

8 Similarity checker: example ABC (Another Bisimilarity Checker) used Model checker used to test different types of bisimilarity o Node executions conveniently represented as silent actions o Strong bisimulation becomes strict one-to-one workflow action mapping o Weak bisimulation ignores internal actions and communications and focuses on visible outputs

9 Process profiling The process algebra representation translated into a Kripke frame o Enumerated states denoting the number of instances of each workflow node o Transitions of the frame are the node executions o Use CTL formulas to query o NuSMV model checker employed Allows questions such as: o Reachability of a particular state o Detection of deadlocks and livelocks o Safety - some state always executing o Bounds on a number of instances of a node Workflow Process model Kripke frame

10 Process profiling: example Reachability o EF F τ 1 – Is there an execution that achieves one instance of F o AF F τ 1 – Do all executions always achieve one instance of F Livelocks o AG (C τ -> AG AF C τ ) – Is there always a livelock with C o EF (C τ -> AG AF C τ ) – Can there be a livelock with C Instance bounds o max X.EF A τ x – What is the maximum number of instances of A

11 Composability checker Polymorphic type formulas for the workflow components/fragments When composing: o The output and input of each fragment compared in terms of free and bound type variables o If no clashes, free variables resolved to form the type formula of the composition o Inference engine developed specifically for the tool Determines: o If a workflow fragment can be reused on a new input o Find compatible services in the warehouse Workflow Data model Type formulas

12 Composability checker: example Fragment of three nodes LMN o Input q, with required attributes A, B, D o Two outputs u, v o A present in both. B in u. D in neither. Two outputs can be joined with O

13 Equivalence tester / optimizer Uses a set of node equivalence rules o Defined for each workflow system or node subset o Algorithm applies allowed transformations to reduce two workflows to the same expression Combined with rewrite heuristics o Node-specific again o Simple example: relational model again Workflow Data model Node equivalences

14 Equivalence tester/optimizer: example Relational workflow searching for Adverse Drug Reactions in GPRD database Rewrite rules o Set of relational equivalences Heuristics o Early projections/selections o Late joins o Easy scenario – brute force algorithm works

15 Related and future work Data typing o COMAD for Kepler Workflow process analysis o GWorkflowDL o YAWL New workflow tools with relational structures o KNIME o Orange o Pentaho Extensions: o Streaming – blocking and batching o Improved state reduction algorithms for CTL model o Adding more type constructs for polymorphism

16 Summary Workflow analysis needed to improve takeup and exploitation of workflows o Enterprise environments o Profile resource usage, risk of failure, execution time o Support reuse and repurposing Separation of control and data aspects allows use of existing model checkers and familiar techniques o Process algebras, temporal logics, type polymorphisms, term graphs Current version works on Discovery Net/InforSense o KNIME, Pentaho very similar – only require extra parsers o Full streaming process model for Taverna in the works


Download ppt "The design and implementation of a workflow analysis tool Vasa Curcin Department of Computing Imperial College London."

Similar presentations


Ads by Google