Presentation is loading. Please wait.

Presentation is loading. Please wait.

WS-PGRADE: Supporting parameter sweep applications in workflows Péter Kacsuk, Krisztián Karóczkai, Gábor Hermann, Gergely Sipos, and József Kovács MTA.

Similar presentations


Presentation on theme: "WS-PGRADE: Supporting parameter sweep applications in workflows Péter Kacsuk, Krisztián Karóczkai, Gábor Hermann, Gergely Sipos, and József Kovács MTA."— Presentation transcript:

1 WS-PGRADE: Supporting parameter sweep applications in workflows Péter Kacsuk, Krisztián Karóczkai, Gábor Hermann, Gergely Sipos, and József Kovács MTA SZTAKI

2 Content Motivations –Lessons learnt from P-GRADE portal –Lessons learnt from CancerGrid Workflow concept of gUSE/WS-PGRADE Parameter sweep support of gUSE –CancerGrid Executing PS nodes of gUSE workflows in desktop grids Conclusions

3 Popularity of P-GRADE portal It has been used in many EGEE and EGEE-related VOs: –GILDA, VOCE, SEE-GRID, BalticGrid, BioInfoGrid, EGRID, etc. It has been used in many national grids: –UK NGS, Grid-Ireland, Turkish Grid, Croatian Grid, Grid Malaysia etc. It has been used as the GIN VO Resource Testing Portal It became OSS in the beginning of Januar 2008: https://sourceforge.net/projects/pgportal/

4 Download of OSS P-GRADE portal 828 downloads so far

5 Lessons learnt from P-GRADE portal Popular because it provides –Easy-to-use but powerful workflow system (graphical editor, wf manager, etc.) –Easy-to-use parameter sweep concept support –Easy-to-use MPI program execution support –Grid virtualization: Multi-grid/multi-VO access mechanism for LCG-2, gLite, GT2 and GT4

6 Introducing three levels of parallelism Each job can be a parallel program – Parallel execution inside a workflow node – Parallel execution among workflow nodes Multiple jobs run parallel – Parameter study execution of the workflow Multiple instances of the same workflow with different data files

7 Parameter study workflow GEN Grid job generates input parameter space COLL Collector grid job evaluates the results of the simulation SEQ Parameter sweep grid jobs This could be any workflow

8 3-phase PS execution in P-GRADE portal First phase: executing ones all the Generators Last phase: executing ones all the Collectors Second phase: executing all generated eWorkflows in parallel

9 CancerGrid workflow needs more Usage of generators and collectors at any node of the WF without any ordering restrictions Usage the PS execution at node-level at any node of the WF without any ordering restrictions

10 CancerGrid workflow needs more x1 xN NxM= 3 million NxM xN N=30K xN xNxN NxM Generator job N = 30K, M = 100 => about 0.5 year execution time NxM= 3 million

11 Solution of the problem We need an environment where the user can develop and execute such a workflow The environment should contain a broker that decides where to execute the nodes –MPI nodes on SG clusters –Nodes with very short execution time on local resources –Seq. nodes with small number of invocations at SGs –Seq. nodes called many times at DGs Such an environment for SGs is: –gUSE: provides a high-level service set based middleware –WS-PGRADE: provides a workflow user interface

12 gUSE and WS-PGRADE gUSE (grid User Support Environment) –is a grid virtualization environment –exposes the grid as a workflow –enables the execution of workflows simultaneously in many grids no matter what their middleware is WS-PGRADE is the user interface to support –Editing, configuring, publishing workflows (as grid applications)

13 PS workflow concept of WS-PGRADE Any node of the workflow can be: –PS job –Generator –Collector There are two kinds of relationship between input files of PS nodes: –Cross product –Dot product

14 Workflow Graph Overview in WS-PGRADE Input Port Node: job, service call (WS, legacy), wf Output Port The Workflow Editor as it appears for the user

15 Configuring the Workflow hmn *K 1 Specify the number of input files on external input Ports Generator job produces multiple data on the output port within one job submission step Specify Dot or Cross product relation of Input ports to define the number of job submissions Specify job to be Collector by defining a Gathering Input Port. The Job execution will be postponed until all input files have arrived to that port Legend: Cross Product Dot Product

16 Animation the number of generated output files hmn m*n h*K S m*nh*K m*n*h*K SS S S S h*K*K 1 S=max(m*n,h*k) 1 Sm*n*h*Km*nhSS Generator job runs h times and each run generates K files on the output port In case of cross product separate job submission is generated for each possible input file combination In case of dot product the job is submitted with input files having a common index number in each input port

17 The user concern I have a large workflow containing: –Sequential nodes to be executed once –Sequential nodes to be executed many times (PS) –MPI nodes to be executed once –MPI nodes to be executed many times (PS) I want to execute this workflow as fast as possible using as many resources as possible

18 x1 xN NxM= 3 million NxM xN N=30K xN xNxN NxM Generator job NxM= 3 million Execution in EDGeS VO of EGEE Execution in the private DG of CancerGrid project Execution in a local resource Execution as Web Service

19 GlobalDEGLocalDEG Putting everything together University DG Volunteer DG Service Grid EGEE gUSE/WS-PGRADE provides the transparent access to SGs/DGs WS- PGRADE Appl. Repository gUSE Service Grid OSG

20 Family of P-GRADE products and their use P-GRADE –Parallelizing applications for clusters and grids P-GRADE portal –Creating simple workflow and parameter sweep applications for grids P-GRADE/GEMLCA portal –Creating workflow applications using legacy codes and community codes from repository gUSE/WS-PGRADE –Creating complex workflow and parameter sweep applications to run on clusters, service grids and desktop grids –Creating workflow applications using embedded workflows, legacy codes and community workflows from workflow repository

21 Conclusions gUSE and WS-PGRADE solve all the limitation problems of P-GRADE portal: –Implementation of gUSE is highly scalable, can be distributed on a cluster or even on different grid sites. –Stress tests show that it can simultaneously serve thousands of jobs (currently manages ~100,000 jobs in CancerGrid) –Its workflow concept is much more expressive than in P-GRADE portal (recursive wf, generic PS support, etc.) –WS-PGRADE provides two user interfaces: Developer (creates and exports WFs into the WF repository of gUSE) End-user (imports and executes WFs from the WF repository) –gUSE provides grid virtualization at workflow level: nodes of a WF can be executed by Web Services, local resources, service grids and desktop grids (see EDGeS project)


Download ppt "WS-PGRADE: Supporting parameter sweep applications in workflows Péter Kacsuk, Krisztián Karóczkai, Gábor Hermann, Gergely Sipos, and József Kovács MTA."

Similar presentations


Ads by Google