Presentation is loading. Please wait.

Presentation is loading. Please wait.

Practical Mechanisms for Managing Parallel and Interactive Jobs on Grid Environments Enol Fernández UAB.

Similar presentations


Presentation on theme: "Practical Mechanisms for Managing Parallel and Interactive Jobs on Grid Environments Enol Fernández UAB."— Presentation transcript:

1 Practical Mechanisms for Managing Parallel and Interactive Jobs on Grid Environments Enol Fernández UAB

2 partners logo INGRID 2008, 9 th april 2008 2 Introduction CrossBroker Glide In Parallel Job Support Interactive Job Support Conclusions

3 partners logo INGRID 2008, 9 th april 2008 3 REMOTE SITE Internet REMOTE SITE Middleware SERVICES Middleware Batch execution on Grids F1F2 Job O1O2

4 partners logo INGRID 2008, 9 th april 2008 4 REMOTE SITE Internet REMOTE SITE Middleware SERVICES Middleware F1F2 Job Parallel & Interactive Job Execution Use of resources from different sites Resource-sets search Co-allocation & synchronization Fast start-up Execution in high-occupancy situations F1F2 Job MPI I/O forwarding

5 partners logo INGRID 2008, 9 th april 2008 5 CrossBroker CrossBroker does automatic scheduling in Grid Environments Resource discovery Resource Selection Job Execution Jobs not treated by gLite: parallel jobs (MPI) Run in more than one resource, in a coordinated fashion. Interactive jobs The user interacts with the application during its execution

6 partners logo INGRID 2008, 9 th april 2008 6 CrossBroker Scheduling Agent Resource Searcher Application Launcher Condor-GDAGMan Migrating Desktop Information Index Replica Manager CrossBroker EGEE/Globus LRMS EGEE/Globus LRMS CE WN Outdated information Dynamic changes LRMS (PBS, LSF, Condor): limited external control Non cooperative LRMS Local user jobs

7 partners logo INGRID 2008, 9 th april 2008 Glide In The idea Each batch job is encapsulated in an agent that takes control over the WN independently of its LRMS Lightweight Virtual Machines Each Worker Node is divided in 2 VM Each VM can execute jobs independently (e.g. batch and interactive) Fast startup of jobs (no need to go trough globus + LRMS) NOT a full virtual machine (Xen, VMWare,…) NO need for special priviledges in the WN

8 partners logo INGRID 2008, 9 th april 2008 8 Glide In Scheduling Agent Condor-G CrossBroker Application Launcher Grid Resource LRMS Batch Job

9 partners logo INGRID 2008, 9 th april 2008 9 Glide In Scheduling Agent Condor-G CrossBroker Application Launcher Grid Resource LRMS Agent VM1VM2 Batch Job

10 partners logo INGRID 2008, 9 th april 2008 10 Glide In Scheduling Agent Condor-G CrossBroker Application Launcher Grid Resource LRMS Agent VM1VM2 Batch Job

11 partners logo INGRID 2008, 9 th april 2008 11 Glide In Scheduling Agent Condor-G CrossBroker Application Launcher Grid Resource LRMS Agent VM1VM2 Batch Job Available for other jobs

12 partners logo INGRID 2008, 9 th april 2008 12 Parallel Job Support Support for parallel jobs: Open MPI PACX-MPI MPICH-P4 MPICH-G2 Plain (just the machines) Takes into account sites capabilites. Low level details of MPI implementations and sites handled by starter scripts. mpi-start is configured automatically and used by default.

13 partners logo INGRID 2008, 9 th april 2008 13 Parallel Job Support Changes in JDL JOBTYPE: Normal: sequential jobs, just one CPU Parallel: more than one CPU SUBJOBTYPE: openmpi pacx-mpi mpich mpich-g2 Plain Plain allows easy extension for supporting new parallel job types

14 partners logo INGRID 2008, 9 th april 2008 14 Parallel Job Support Type = "Job"; VirtualOrganisation = " imain"; JobType = " Parallel"; SubJobType = " pacx-mpi "; NodeNumber = 5; Executable = " test-app"; Arguments = " -v"; InputSandbox = { " test-app ", " inputfile " }; OutputSanbox = { " std.out ", " std.err " }; StdErr = " std.err ; StdOutput = " std.out " ; Rank = other.GlueHostBenchmarkSI00 ; Requirements = other.GlueCEStateStatus == " Production " ;

15 partners logo INGRID 2008, 9 th april 2008 15 Parallel Job Support [Groups with 1 CEs] [Rank=2000] aocegrid.uab.es:2119/jobmanager-pbs-workq freeCPUs = 10 [Groups with 2 CEs] [Rank=1500] zeus.cyf-kr.edu.pl:2119/jobmanager-pbs-workq freeCPUs = 2 bee001.ific.uv.es:2119/jobmanager-pbs-workq freeCPUs = 3 [Rank=1000] bee001.ific.uv.es:2119/jobmanager-pbs-workq freeCPUs = 3 lngrid02.lip.pt:2129/jobmanager-pbs-workq freeCPUs = 2 CE CE4= xgrid.icm.edu.pl FreeCPUs = 6 Disk = 100 AverageSI = 1000 CE CE2=aocegrid.uab.es FreeCPUs = 10 Disk = 100 AverageSI = 4000 CE CE3=bee001.ific.uv.es FreeCPUs = 3 Disk = 100 AverageSI = 1000 CE CE1=zeus.cyf-kr.edu.pl FreeCPUs = 2 Disk = 100 AverageSI = 2000 Cross Broker MPI enabled CE Non-MPI enabled CE CE CE5=lngrid02.lip.pt FreeCPUs = 2 Disk = 100 AverageSI = 1000 [Groups with 1 CEs] [Rank=2000] aocegrid.uab.es:2119/jobmanager-pbs-workq freeCPUs = 10 [Rank=1500] zeus.cyf-kr.edu.pl:2119/jobmanager-pbs-workq freeCPUs = 2 bee001.ific.uv.es:2119/jobmanager-pbs-workq freeCPUs = 3 Rank=1000] lngrid02.lip.pt/jobmanager-pbs-workq freeCPUs = 2 bee001.ific.uv.es:2119/jobmanager-pbs-workq freeCPUs = 3

16 partners logo INGRID 2008, 9 th april 2008 16 Parallel Job Support CE CE3=bee001.ific.uv.es FreeCPUs = 3 Disk = 100 AverageSI = 1000 Cross Broker CE CE5=lngrid02.lip.pt FreeCPUs = 2 Disk = 100 AverageSI = 1000 MPI SubTask MPI SubTask Startup server 1. Launch a PACX Startup Server 2. Submit MPI Subtasks 3. MPI-START will start each of the Subtasks 4. Subtask notify the startup server and start running 5. CrossBroker monitors the application

17 partners logo INGRID 2008, 9 th april 2008 17 Parallel Job Support CrossBroker search and selects sets of resources for the jobs There is no guarantee that all tasks of the same job will start at the same time 1st choice: select only sites with free resources. The job will run immediately. Unfortunately, free resources are not always available 2nd choice: allocate a resource temporally and wait until all other tasks show up. Timeshare the resource with a backfilling policy to avoid resource iddleness

18 partners logo INGRID 2008, 9 th april 2008 18 Glide In for co-allocation Scheduling Agent Condor-G CrossBroker Grid Resource LRMS MPI JOB

19 partners logo INGRID 2008, 9 th april 2008 19 Glide In for co-allocation Scheduling Agent Condor-G CrossBroker Application Launcher Grid Resource LRMS Agent VM1VM2 Waiting for the rest of tasks MPI JOB MPI Task

20 partners logo INGRID 2008, 9 th april 2008 20 Glide In for co-allocation Scheduling Agent Condor-G CrossBroker Application Launcher Grid Resource LRMS Agent VM1VM2 MPI TASK JOB BackFilling While the MPI waits

21 partners logo INGRID 2008, 9 th april 2008 21 Glide In for co-allocation Scheduling Agent Condor-G CrossBroker Application Launcher Grid Resource LRMS Agent VM1VM2 MPI TASK All tasks Ready! JOB

22 partners logo INGRID 2008, 9 th april 2008 Interactive Job Support Fast startup: Cache of resources: fast matchmaking Scheduling priority: use free resources or glideins Fast notification of events CrossBroker injects interactive agents that enable communication between user and job Transparent to the user Condor Bypass & glogin agents

23 partners logo INGRID 2008, 9 th april 2008 23 Interactive Job Support Changes in JDL INTERACTIVE: true/false. Indicates that the job is interactive and the broker should treat it with higher proirity INTERACTIVEAGENT INTERACTIVEAGENTARGUMENTS These attributes specify the command (and its arguments) used to communicate with the user.

24 partners logo INGRID 2008, 9 th april 2008 24 Interactive MPI application Type = "Job"; VirtualOrganisation = "imain"; JobType = "Parallel"; SubJobType = openmpi"; NodeNumber = 4; Interactive = TRUE; InteractiveAgent = glogin; InteractiveAgentArguments = -r –p 195.168.105.65:23433; Executable = "test-app"; InputSandbox = {"test-app", "inputfile"}; OutputSanbox = {"std.out", "std.err"}; StdErr = "std.err; StdOutput = " std.out " ; Rank = other.GlueHostBenchmarkSI00 ; Requirements = other.GlueCEStateStatus == " Production " ;

25 partners logo INGRID 2008, 9 th april 2008 Interactive MPI application Worker Users Machine Video Stream glogin Master Worker MPI Started with mpi-start Remote Resource Started by the CrossBroker

26 partners logo INGRID 2008, 9 th april 2008 26 Glide In for interactive jobs Scheduling Agent Condor-G CrossBroker Application Launcher Grid Resource LRMS Agent VM1VM2 BATCH INT. JOB

27 partners logo INGRID 2008, 9 th april 2008 27 Glide In for interactive jobs Scheduling Agent Condor-G CrossBroker Application Launcher Grid Resource LRMS Agent VM1VM2 BATCH INT. JOB BATCH Priority adjustment Startup-time Reduction Only one layer involved

28 partners logo INGRID 2008, 9 th april 2008 Conclusions & Future work CrossBroker gives support to Parallel and Interactive jobs Automatically Interoperable with EGEE Glide In Fast startup of jobs Co-allocation without reservation or wasting resources Future work: Explore more complex multiprogramming (e.g. 3 or more VM) Decentralization of the services

29 partners logo Practical Mechanisms for Managing Parallel and Interactive Jobs on Grid Environments Enol Fernández UAB


Download ppt "Practical Mechanisms for Managing Parallel and Interactive Jobs on Grid Environments Enol Fernández UAB."

Similar presentations


Ads by Google