Presentation on theme: "REI – Recipe Execution Infrastructure Jens Knudstrup/2005-02-08 REI Recipe Execution Infrastructure."— Presentation transcript:
REI – Recipe Execution Infrastructure Jens Knudstrup/ REI Recipe Execution Infrastructure
REI – Recipe Execution Infrastructure Jens Knudstrup/ Purpose of REI Main Objectives of REI -Provide the services of a parallel Batch Queue System. -Make it easy to control and monitor complicated batches with job synchronization. -Make it possible to distribute tasks (processing load) over a cluster of CPUs/nodes. Not Provided in the Present Implementation -Services for distributing data within the cluster to the nodes doing the processing (data sharing/distribution done via a common storage area/file server). -Services provided for resource management and advertising. -Services provided for explicit load balancing (optimized job distribution). -Special features for GRID appliance provided.
REI – Recipe Execution Infrastructure Jens Knudstrup/ Main Features Main Features of REI -Implemented in C++ (in house implementation from scratch). -Uses RDBMS for information sharing and task synchronization. -Execution of shell commands or native execution of CPL Recipes (no generic interfacing to shared object files). -Pworker task execution daemon provided – can take three roles: -Process Master Commands – Master Pworker. -Process Standard Commands – Standard Pworker. -Process Master and Standard Comands. -Command line utilities provided to add/remove/monitor commands and to control Pworkers. -API provided for implementing Master Command Libraries (also referred to as Recipe Planners) and Standard Command Libraries.
REI – Recipe Execution Infrastructure Jens Knudstrup/ Command Line Interface Interaction with REI -Command line interface provided: -addcmd: Add a Master Command in the Master Command Queue (handles ABs and SOFs, which are not part of core of REI). -cmdstat: Query the status of all commands or a specific command. Tail feature provided. -rmcmd: Remove information for one command or all commands from the Command Queues (clean up). -pworker: The Pworker daemon. -stopworker: Stop one specific Pworker or all Pworkers running. -listworkers: List Pworkers running in the system. -rmworker: Remove a Pworker (make it exit) or all Pworkers. -The commands are not part of the core REI system, but should be seen as convenience features. They are based on the REI libraries. -Can add commands in the DB directly via the REI libraries, i.e., can control and monitor the operation of REI programmatically.
REI – Recipe Execution Infrastructure Jens Knudstrup/ Command Lifecycle Command States -Each command submitted has 1 of 7 states indicating its current status:
REI – Recipe Execution Infrastructure Jens Knudstrup/ Command Transitions
REI – Recipe Execution Infrastructure Jens Knudstrup/ Interprocess Synchronization Interprocess Synchronization/Information Sharing -Pworkers synchronize themselves via the DB. -DB also used for exchanging information between processes in the system: -Tables: -pworker_registry: Information about Pworkers in the system (ID, node, Master and/or Standard Commands, …). -pworker_master_command_queue: Contains information for the Master Commands waiting to be executed under execution and executed. -pworker_master_sequencer: Contains information about Master Commands being BLOCKED. -pworker_command_queue: Standard Commands waiting to be executed under execution and executed. -pworker_command_sequencer: Used to sequence Standard Commands. -pworker_log: Log messages from Pworker processes.
REI – Recipe Execution Infrastructure Jens Knudstrup/ OmegaCam Demo Science Reduction Cascade/1 OmegaCam Science Demo Cascade – Example -Used adapted WFI frames (8 extensions). -Provided: -OCAM REI Recipe Planner Plug-In to schedule tasks for the recipes (general Recipe Planner for all Recipes made). -REI Standard Command Library Plug-Ins to do FITS file splitting and joining. -Cascade Scheduler Script to submit Master Commands and to create SOFs needed. -6 Recipes executed during the cascade (6 Master Commands issued to REI). -Total number of commands scheduled within REI for the cascade: ~100. -Total number of intermediate/temporary and final data products: ~200. -Number of SOFs involved: 10.
REI – Recipe Execution Infrastructure Jens Knudstrup/ Task Synchronization Master Split BIAS Join Master Split DOME Join Compl
REI – Recipe Execution Infrastructure Jens Knudstrup/ Command Scheduling Frame A Frame B Split Join Recipe
REI – Recipe Execution Infrastructure Jens Knudstrup/ DFO Cascading Controlling REI – DFO Environment -Already used in operation by DFO (since a while). -DFO uses REI to control scheduling of a UNIX shell script, which itself controls the execution of the recipes (calling internally esorex ). -DFO uses parallelism at frame level, no parallelism in connection with the processing of each frame. -REI used as a queue system, jobs are submitted and the scheduling and execution of the jobs carried out by REI. -Example addcmd in DFO environment: $ addcmd -name SINFO T20:25:28.895_tpl.ab -bg -trigger mflat_SINFO T20:25:28.895_tpl.ab -exe processAB -a SINFO T20:25:28.895_tpl.ab $ addcmd -name SINFO T19:55:07.961_tpl.ab -bg -trigger mwave_SINFO T19:55:07.961_tpl.ab -waitfor mflat_SINFO T20:25:28.895_tpl.ab -exe processAB -a SINFO T19:55:07.961_tpl.ab
REI – Recipe Execution Infrastructure Jens Knudstrup/ Using REI How to Integrate a Pipeline in REI (Simplified …) -Decide how to execute the recipes: 1.Native way in the form of CPL Recipes. 2.Invoke the recipe library methods/functions from within Standard Commands. 3.Execute via jacket scripts/applications encapsulating recipe. -Define the necesary/desirable level of parallelism. -Define execution plans for the various cascades. -Implement Recipe Planner, if necessary, to do the internal coordination of the command scheduling (+ producing data for the Standard Commands). -Implement Standard Command Library with special commands, which should execute internally within the REI environment (if required). -Implement external control scripts to submit Master Commands, defining dependencies and providing data for the command execution if necessary. -Decide architecture of processing cluster (number of Master Pworkers, Pworkers, CPUs, nodes, amount of memory per CPU, …). -Start up Pworkers, defining their proper role + referring to the Command Plug-in Libraries provided (if any) and/or possible CPL Recipe Plug-in Libraries.