Presentation is loading. Please wait.

Presentation is loading. Please wait.

Int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications on the Grid Elisa Heymann Department of Computer Architecture and Operating.

Similar presentations


Presentation on theme: "Int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications on the Grid Elisa Heymann Department of Computer Architecture and Operating."— Presentation transcript:

1 int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications on the Grid Elisa Heymann Department of Computer Architecture and Operating Systems

2 partner’s logo Condor Week 2008, May 2008 2 Outline  Introduction  CrossBroker  Parallel Job Support  Interactive Job Support  Conclusions

3 partner’s logo Condor Week 2008, May 2008 3 Introduction  int.eu.grid Environment: gLite (EGEE Grid Middleware) Extensions CrossBroker Migrating Desktop  Jobs not handled by gLite: parallel jobs (MPI) Run in more than one resource Interactive jobs The user interacts with the application during its execution

4 partner’s logo Condor Week 2008, May 2008 4 REMOTE SITE Internet REMOTE SITE Middleware SERVICES Middleware Batch execution on Grids F1F2 Job O1O2

5 partner’s logo Condor Week 2008, May 2008 5 REMOTE SITE Internet REMOTE SITE Middleware SERVICES Middleware F1F2 Job Parallel & Interactive Job Execution  Use of resources from different sites  Resource-sets search  Co-allocation & synchronization  Fast start-up  Execution in high-occupancy situations F1F2 Job MPI I/O forwarding

6 partner’s logo Condor Week 2008, May 2008 6 Architecture Scheduling Agent Resource Searcher Application Launcher Condor-GDAGMan CE WN EGEE/Globus CE WN EGEE/Globus Migrating Desktop Information Index Replica Manager CrossBroker

7 partner’s logo Condor Week 2008, May 2008 7 Architecture - CrossBroker  Scheduling Agent Receives each job and keeps it in a persistent queue Contacts Resource Searcher and gets a list of available resources Selects resources and passes them to the Application Launcher  Resource Searcher Given a job description (JobAd), performs the matchmaking between job needs and available resources. Uses the Condor ClassAd library, originally designed for matches of a single job with a single resource. A set matching has been developed to support matches of a single job to a group of resources.  Application Launcher Responsible for providing a reliable submission service of parallel applications on the Grid. Responsible for file staging at the remote site (executable and input/output files) Uses the services of Condor-G

8 partner’s logo Condor Week 2008, May 2008 8 Parallel Job Support  Support for parallel jobs: Open MPI PACX-MPI MPICH-P4 MPICH-G2  Takes into account sites capabilites  Ability to define starter scripts/process to start the parallel job mpi-start is configured automatically and used by default.

9 partner’s logo Condor Week 2008, May 2008 9 Parallel Job Support CE CE3=bee001.ific.uv.es FreeCPUs = 3 Disk = 100 AverageSI = 1000 Cross Broker CE CE5=lngrid02.lip.pt FreeCPUs = 2 Disk = 100 AverageSI = 1000 MPI SubTask MPI SubTask Startup server 1. Launch a PACX Startup Server 2. Submit MPI Subtasks 3. MPI-START will start each of the Subtasks 4. Subtask notify the startup server and start running 5. CrossBroker monitors the application

10 partner’s logo Condor Week 2008, May 2008 10 Parallel Job Support  Job Description Language file: JOBTYPE: Normal: sequential jobs, just one CPU Parallel: more than one CPU SUBJOBTYPE: openmpi pacx-mpi mpich mpich-g2 plain JOBSTARTER (if not defined, mpi-start) JOBSTARTERARGUMENTS

11 partner’s logo Condor Week 2008, May 2008 11 Parallel Job Support Type = "Job"; VirtualOrganisation = " imain"; JobType = " Parallel"; SubJobType = " pacx-mpi "; NodeNumber = 5; Executable = " test-app"; Arguments = " -v"; InputSandbox = { " test-app ", " inputfile " }; OutputSanbox = { " std.out ", " std.err " }; StdErr = " std.err “; StdOutput = " std.out " ; Rank = other.GlueHostBenchmarkSI00 ; Requirements = other.GlueCEStateStatus == " Production " ;

12 partner’s logo Condor Week 2008, May 2008 12 MPI Across Sites  CrossBroker search and selects sets of resources for the jobs  There is no guarantee that all tasks of the same job will start at the same time 1st choice: select only sites with free resources. The job will run immediately. Unfortunately, free resources are not always available 2nd choice: allocate a resource temporally and wait until all other tasks show up. Timeshare the resource with a backfilling policy to avoid resource idleness

13 partner’s logo Condor Week 2008, May 2008 13 MPI Across Sites [Groups with 1 CEs] [Rank=2000] aocegrid.uab.es:2119/jobmanager-pbs-workq freeCPUs = 10 [Groups with 2 CEs] [Rank=1500] zeus.cyf-kr.edu.pl:2119/jobmanager-pbs-workq freeCPUs = 2 bee001.ific.uv.es:2119/jobmanager-pbs-workq freeCPUs = 3 [Rank=1000] bee001.ific.uv.es:2119/jobmanager-pbs-workq freeCPUs = 3 lngrid02.lip.pt:2129/jobmanager-pbs-workq freeCPUs = 2 CE CE4= xgrid.icm.edu.pl FreeCPUs = 6 Disk = 100 AverageSI = 1000 CE CE2=aocegrid.uab.es FreeCPUs = 10 Disk = 100 AverageSI = 4000 CE CE3=bee001.ific.uv.es FreeCPUs = 3 Disk = 100 AverageSI = 1000 CE CE1=zeus.cyf-kr.edu.pl FreeCPUs = 2 Disk = 100 AverageSI = 2000 RS MPI enabled CE Non-MPI enabled CE CE CE5=lngrid02.lip.pt FreeCPUs = 2 Disk = 100 AverageSI = 1000 [Groups with 1 CEs] [Rank=2000] aocegrid.uab.es:2119/jobmanager-pbs-workq freeCPUs = 10 [Rank=1500] zeus.cyf-kr.edu.pl:2119/jobmanager-pbs-workq freeCPUs = 2 bee001.ific.uv.es:2119/jobmanager-pbs-workq freeCPUs = 3 Rank=1000] lngrid02.lip.pt/jobmanager-pbs-workq freeCPUs = 2 bee001.ific.uv.es:2119/jobmanager-pbs-workq freeCPUs = 3

14 partner’s logo Condor Week 2008, May 2008 14 Time Sharing Scheduling Agent Condor-G CrossBroker Grid Resource LRMS MPI JOB

15 partner’s logo Condor Week 2008, May 2008 15 Time Sharing Scheduling Agent Condor-G CrossBroker Application Launcher Grid Resource LRMS MPI JOB

16 partner’s logo Condor Week 2008, May 2008 16 Time Sharing Scheduling Agent Condor-G CrossBroker Application Launcher Grid Resource LRMS Condor GlideIn VM1VM2 MPI JOB

17 partner’s logo Condor Week 2008, May 2008 17 Time Sharing Scheduling Agent Condor-G CrossBroker Application Launcher Grid Resource LRMS Condor GlideIn VM1VM2 MPI JOB

18 partner’s logo Condor Week 2008, May 2008 18 Time Sharing Scheduling Agent Condor-G CrossBroker Application Launcher Grid Resource LRMS Condor GlideIn VM1VM2 MPI TASK Wait for the rest of MPI tasks

19 partner’s logo Condor Week 2008, May 2008 19 Time Sharing Scheduling Agent Condor-G CrossBroker Application Launcher Grid Resource LRMS Condor GlideIn VM1VM2 MPI TASK JOB

20 partner’s logo Condor Week 2008, May 2008 20 Time Sharing Scheduling Agent Condor-G CrossBroker Application Launcher Grid Resource LRMS Condor GlideIn VM1VM2 MPI TASK JOB BackFilling while the MPI waits

21 partner’s logo Condor Week 2008, May 2008 21 Time Sharing Scheduling Agent Condor-G CrossBroker Application Launcher Grid Resource LRMS Condor GlideIn VM1VM2 MPI TASK All tasks Ready! JOB

22 partner’s logo Condor Week 2008, May 2008 22 Interactive Job Support  Scheduling priority Interactive jobs are sent to sites with available machines If there are not available machines, use time sharing  Support for interactivity in all kinds of jobs sequential and all the MPI flavors  CrossBroker injects interactive agents that enable communication between user and job Transparent to the user Full integration with glogin & gVid Condor Bypass supported

23 partner’s logo Condor Week 2008, May 2008 23 Interactive Job Support  Job Description Language file: INTERACTIVE: true/false. Indicates that the job is interactive and the broker should treat it with higher proirity INTERACTIVEAGENT INTERACTIVEAGENTARGUMENTS These attributes specify the command (and its arguments) used to communicate with the user.

24 partner’s logo Condor Week 2008, May 2008 24 Interactive Job Support Type = "Job"; VirtualOrganisation = "imain"; JobType = "Parallel"; SubJobType = “openmpi"; NodeNumber = 11; Interactive = TRUE; InteractiveAgent = “glogin“; InteractiveAgentArguments = “-r –p 195.168.105.65:23433“; Executable = "test-app"; InputSandbox = {"test-app", "inputfile"}; OutputSanbox = {"std.out", "std.err"}; StdErr = "std.err“; StdOutput = " std.out " ; Rank = other.GlueHostBenchmarkSI00 ; Requirements = other.GlueCEStateStatus == " Production " ;

25 partner’s logo Condor Week 2008, May 2008 25 Interactive Job Support Particle trajectories in Fusion devices Increasing the temperature of a gas, we get a plasma state At this temperature, the union of light atom nuclei is possible through an exothermal process: Mass after fusion process is less than before it Exceeding mass -> energy

26 partner’s logo Condor Week 2008, May 2008 26 Time Sharing Scheduling Agent Condor-G CrossBroker Application Launcher Grid Resource LRMS Condor GlideIn VM1VM2 BATCH INT. JOB

27 partner’s logo Condor Week 2008, May 2008 27 Time Sharing Scheduling Agent Condor-G CrossBroker Application Launcher Grid Resource LRMS Agent VM1VM2 BATCH INT. JOB Startup-time Reduction Only one layer involved

28 partner’s logo Condor Week 2008, May 2008 28 Conclusions  CrossBroker supports both Parallel and Interactive jobs Automatically Interoperable with EGEE  Glide In Fast startup of jobs Co-allocation without reservation or wasting resources  Real Applications Visualization of plasma in fusion devices Evolution of pollution clouds in the atmosphere Ultrasound Computing Tomography: Reconstruction of a 3D volume FLUIDYNAMICS application

29 Questions? Elisa Heymann Department of Computer Architecture and Operating Systems


Download ppt "Int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications on the Grid Elisa Heymann Department of Computer Architecture and Operating."

Similar presentations


Ads by Google