Composing Models of Computation in Kepler/Ptolemy II Antoon Goderis U of Manchester (myGrid/ Taverna) Christopher Brooks UC Berkeley (Ptolemy II) Ilkay Altintas UC San Diego (Kepler) Edward A. Lee UC Berkeley (Ptolemy II) Carole Goble U of Manchester (myGrid/Taverna)
The talk Models of Computation (MoCs) Composing Models of Computation Use cases from science PtolemyII/Kepler Composing Models of Computation Conditions for valid (hierarchical-) compositions Example based on process networks and data flow Table of valid compositions
Use cases for different MoCs To model scientific problems naturally Biology: Gene annotation pipelines [Dataflow] for pipeline compositions Fluid dynamics: Lattice-Boltzmann simulations [Continuous-time based ordinary differential equation solvers]
Models of Computation What is a component? (ontology) Thomas Kuhn, originator of the paradigm paradigm What is a component? (ontology) States? Processes? Threads? Differential equations? Constraints? Objects (data + methods)? What knowledge do components share? (epistemology) Time? Name spaces? Signals? State? How do components communicate? (protocols) Rendezvous? Message passing? Continuous-time signals? Streams? Method calls? Events in time? What do components communicate? (lexicon) Objects? Transfer of control? Data structures? ASCII text?
Exploring Models of Computation… … for scientific computing Ptolemy II Kepler
Scientific workflow design and re-use Support design & re-use via separation of concerns Structural data types Semantic types Type checking Structure Execution semantics
PtolemyII/Kepler: Actor-Oriented Design Object orientation: class name data methods call return What flows through an object is sequential control Actor orientation: Actors are executable entities which communicate with one another via message passing. Messages (input/output data) are encapsulated in tokens. Messages are sent through ports. What flows through an object is streams of data actor name data (state) parameters Input data Output data ports
Structure of PtolemyII/Kepler workflows Hierarchical Entities, Ports, Connections and Attributes This abstract syntax is compatible with many semantic interpretations. The concurrency and communication model together is what we call the model of computation (MoC). Syntax defines the structure of a workflow, but says little about what it means.
Execution semantics: Director Implements the model of computation Governs the execution of an actor (workflow) Scheduling, dispatching threads, etc.
Implemented Models of Computation Survival of the fittest is the only reasonable way to choose among these. Implemented Models of Computation PN – process networks SDF – synchronous dataflow DDF – dynamic dataflow FSM – finite state machines CT – continuous-time modeling DE – discrete-event systems SR - Synchronous/Reactive systems RendezVous – concurrent threads with rendezvous GR – graphics … Each of these defines a component ontology and an interaction semantics between components. There are many more possibilities! In use in Kepler Available in Kepler Realized in Ptolemy II
The talk Models of Computation (MoCs) Composing Models of Computation Use cases from science PtolemyII/Kepler Composing Models of Computation Conditions for valid (hierarchical-) compositions Example based on process networks and data flow Table of valid compositions
Use cases for composing MoCs (1) Intra-disciplinary collaboration Biology: gene annotation to systems biology [data flow + cont time] Inter-disciplinary collaboration Chem- to bio-informatics [cont time + data flow] Mix software workflows with physical systems sensor networks and electron microscopes [cont time] Performance of computation-intensive workflows visualization [3D animation]
Use cases for composing MoCs (2) Mix workflow management with running models for analysis or simulation Biology: selective extraction and analysis of proteins from public databases [finite state machines + dataflow] Fluid dynamics: dynamically adapting model control parameters of Lattice-Boltzmann simulations [finite state machines + cont time] Integrated provenance collection Include dynamic changes in the overall model as well as parameter sweeps within each model
MoC composition in chemistry Actor/workflow based on Synchronous Data Flow Actor/workflow based on Kahn Process Network
How to compose MoCs (directors)? No classification exists to determine which director combinations are valid
How to compose MoCs (directors)? No classification exists to determine which director combinations are valid We need to know two things about a director: What properties it exports via the composite actor in which it is placed
Inner director exports certain properties
How to compose MoCs (directors)? No classification exists to determine which director combinations are valid We need to know two things about a director: What properties it exports via the composite actor in which it is placed What properties it requires of the actors under its control
outer director requires certain properties
If a director’s exported properties match those required by another director, then it can be used within that other director
So, what are these properties? It turns out we can determine director compatibility based on three levels of adherence to actor abstract semantics
Actor Abstract Semantics iterate() prefire() fire() postfire() Flow of control Initialization Execution Finalization Specifications: prefire(): synchronizes to the environment and checks firing conditions fire(): generates outputs based on current inputs and states postfire(): updates the states for next iteration
Three flavours of actor semantics Strict Loose Loosest Implements methods? Yes Methods must return? No Fire() doesn’t change state?
Compatible director compositions exported abstract semantics should be stricter than or equal to required abstract semantics
Example: composing PN and SDF Kahn Process Networks Asynchronous communication between processes; thread for each actor. PN director does not require that any method eventually returns. The methods run in a separate thread belonging entirely to the actor. PN does not guarantee that any method eventually returns. Synchronous Data Flow Director “fires” actors when input tokens are available. SDF director requires that methods return. The fire() method can change state. SDF director guarantees that methods return.
Determining PN and SDF compatibility exported abstract semantics should be stricter than or equal to required abstract semantics
Determining PN and SDF compatibility exported abstract semantics should be stricter than or equal to required abstract semantics
Actor/workflow based on Kahn SDF inside PN example Actor/workflow based on Kahn Process Network PN requires loosest abstract actor semantics
Actor/workflow based on SDF inside PN example Actor/workflow based on Synchronous Data Flow SDF exports loose abstract actor semantics
SDF inside PN example Actor/workflow based on Synchronous Data Flow Actor/workflow based on Kahn Process Network SDF exports loose PN requires loosest, so OK to combine
The others FSM very flexible CT (continuous dynamics) works well as inner director PN very inflexible Living document: http://www.mygrid.org.uk/wiki/Papers/IccsPaper2007
Summary A need for multiple models of computation, and their composition, in e-science Practical table of valid compositions for models of computation in PtolemyII/Kepler Questions? E-mail: goderisa@cs.man.ac.uk
Xie Xie Bertram Ludaescher, UC Davis, Kepler Gang Zhou and Thomas Feng, UC Berkeley, PtolemyII John Brooke, U Manchester