Presentation on theme: "GLOBUS PLUG-IN FOR WINGS WOKFLOW ENGINE Elizabeth Martí ITACA Universidad Politécnica de Valencia"— Presentation transcript:
GLOBUS PLUG-IN FOR WINGS WOKFLOW ENGINE Elizabeth Martí ITACA Universidad Politécnica de Valencia firstname.lastname@example.org
INTRODUCTION Take advantage of two concepts: Workflow & Grid. Workflow provides the automation of the processes. Grid makes possible the development of high-performance computing systems using heterogeneous geographically distributed resources with multiple administrative domains. A Grid workflow can be defined as the composition of grid application services which execute on heterogeneous and distributed resources in a well-defined order to accomplish a specific goal (Rajkumar Buyya).
MOTIVATION There have appeared many different workflow initiatives. Askalon, Karajan, Kepler, K-WfGrid, Taverna, Triana, etc. They lack of some important characteristics: multi grid capability. easy extensibility to new middleware. etc. WINGS provides new features focusing on high level definition, multigrid and extensibility capabilities. The most significant features of WINGS are: Expressiveness to capture specificities of grid computing. Provide flow control structures. Consider simple light operations. It is able to deal with different grid middlewares and versions.
WINGS CONCEPTS It is based on four concepts to model a workflow: Data sources: Communication points to interchange data among the different executions of the workflow. Activities: Abstractions of tasks to be run on the Grid. Describe the functionality of the tasks. Are defined by: The input and output parameters (simple/structured types). The list of deployments that provides the multi-grid middlewares specifics. Executions: Specific instances of an activity. The engine is in charge of selecting from the different deployments defined for each activity, according to where it going to be run. Operations: Simple executions that will be executed by the workflow runtime in order to pre or post process the information available in the Data Sources, to be used by the next tasks. Examples: arithmetic and reduction operations, string search operations, field extractions operations, split or merge file operations etc.
WINGS ENGINE It considers a pure data flow language where a workflow is a sequence of: DS – Execution or Operation – DS Simplifies the workflow description and understanding, and also increases the expressiveness. It is in charge of providing the functionality defined in the XML file, creating a environment to launch concurrent jobs. A key issue in a multi-grid environment is the movement of the files among the different resources of consecutive tasks, so the RT tries to: Reduce the number of data transferences. Deal with different physical file storage systems.
WINGS ARCHITECTURE WINGS Core Engine WINGS Core Engine Middlewares Engines Middlewares Engines Fura GT2 etc … Transference Systems Transference Systems Fura IXOS GridFTP etc … Operations Arithm etic Split File Split File etc … Information Systems Information Systems Fura RM MDS etc …
EXECUTION SCHEME Core Engine: Performs the logic and control operations: Prepare and select the tasks ready to be launched and the data to use in each execution. Plug-ins: In charge of effectively perform the file transferences and all the needed operations to complete the execution.
MIDDLEWARE PLUGINS Extended functionality just implementing a plug-in and adding it to the system. In the first version of the workflow engine, the Fura middleware plug-in was developed Now a Globus Toolkit plug-in has been implemented to enable multi-grid tests. Globus has been selected due to the great number of current infrastructures that use it as the underlying grid middleware (EGEE, EELA, etc.).
GLOBUS PLUG-IN Step 1 : To prepare the activity. Workflow model is defined at the XML file. Create a valid proxy (Proxy store). Define the execution enviroment of the task (Globus, Fura,…). Create a working directory on the execution host (GridFTP). Create an execution directory (GridFTP). Copy the executable to the execution host (Third party copy with UrlCopy). If necessary copy auxiliary data used by the executable (libraries, jar files, …).
GLOBUS PLUG-IN Step 2 : To prepare the initial data. Obtain the information of the input data (XML file) and store it (input parameters matrix). Obtain the number of microtask (combination of inputs). Create an input directory. Copy the input data to the input directory (UrlCopy). Create an output directory.
GLOBUS PLUG-IN Step 3 : Execute the task. Define the RSL file for the task. – Executable, arguments, working directory, etc. Create a GRAM Job for each RSL file. Launch the job (batch mode). – Parallel execution of microtasks.
GLOBUS PLUG-IN Step 4 : Get output data. Get output data from the output directory. – Use of wildcards to filter files. Create a replica of results in a specified location. – Path specification at the data source definition. Clean intermediate data. – Implementation of a function to delete recursively directories.
USE CASE A biomedical application representing the execution of a medical images co-registration process (rigid and elastic). The co-registration processes compare all the images with the base study to align the voxels of the studies to be as much as possible similar to the reference image. The input data are dynamic series of 3D magnetic resonance images after the injection of a contrast bolus in the area of the abdomen, to study the perfusion of the liver. The set are composed by 5 studies with 12 slices.
USE CASE Biomedical Application The workflow is composed by three steps 1.Rigid co-registration 2.Elastic co-registration (the most CPU consuming step) 3.Process to transpose the N studies (with K slices) results of the co-registration into K studies with N slices.
CONCLUSIONS We have analyzed previous works and some of them have good features but do not fit our needs. WINGS has been designed in a modular way enabling to add new components to the system through a plug- in. We have implemented a Globus plug-in oriented to GT middleware. Currently Fura, Globus Toolkit (pre-ws services), and sub-workflow execution plugins have been developed enabling to launch cross-middleware tests with the two specified grid systems.