Presentation is loading. Please wait.

Presentation is loading. Please wait.

FESR Consorzio COMETA - Progetto PI2S2 Using MPI to run parallel jobs on the Grid Marcello Iacono Manno Consorzio COMETA

Similar presentations


Presentation on theme: "FESR Consorzio COMETA - Progetto PI2S2 Using MPI to run parallel jobs on the Grid Marcello Iacono Manno Consorzio COMETA"— Presentation transcript:

1 www.consorzio-cometa.it FESR Consorzio COMETA - Progetto PI2S2 Using MPI to run parallel jobs on the Grid Marcello Iacono Manno Consorzio COMETA marcello.iacono@ct.infn.it TUTORIAL GRID PER I LABORATORI NAZIONALI DEL SUD 26 Febbraio 2008

2 Catania, Tutorial Grid per i Laboratori Nazionali del Sud, 26 Febbraio 2008 2 Outline Overview Requirements & Settings How to create a MPI job How to submit a MPI job to the Grid

3 Catania, Tutorial Grid per i Laboratori Nazionali del Sud, 26 Febbraio 2008 3 Currently parallel applications use “special” HW/SW Parallel application are “normal” on a Grid Many are trivially parallelizable Grid middleware offers several parallel jobs (DAG, collection) A common solution for non – trivial parallelism is: Message Passing Interface (MPI) – based on send() and receive() primitives – a “master” node starts some processes “slaves” by establishing SSH sessions – all processes can share a common workspace and/or exchange data Overview

4 Catania, Tutorial Grid per i Laboratori Nazionali del Sud, 26 Febbraio 2008 4 Several MPI implementations but only two of them are currently supported by the Grid Middleware: – MPICH – MPICH2 Both “old” GigaBit Ethernet and “new” low-latency InfiniBand nets are supported – Cometa infrastructure will run MPI jobs on either GigaBit (MPICH, MPICH2) or InfiniBand (MVAPICH, MVAPICH2) Currently, MPI parallel jobs can run inside a single Computing Elements (CE) only – several projects are involved into studies concerning the possibility of executing parallel jobs on Worker Nodes (WNs) belonging to different CEs MPI & Grid

5 Catania, Tutorial Grid per i Laboratori Nazionali del Sud, 26 Febbraio 2008 5 From the user’s point of view, MPI jobs are specified by setting the JDL JobType attribute to MPICH, MPICH2, MVAPICH, MVAPICH2 specifying the NodeNumber attribute as well JobType = “MPICH”; NodeNumber = 2; This attribute defines the required number of CPU cores (PEs) Matchmaking: the Resource Broker (RB) chooses a CE (if any!) with enough free Processing Elements (PE = CPU cores) e.g.: free PE# ≥ NodeNumber (otherwise “wait!”) JDL (1/3)

6 Catania, Tutorial Grid per i Laboratori Nazionali del Sud, 26 Febbraio 2008 6 When these two attributes are included in a JDL script the following expression is automatically added: (other.GlueCEInfoTotalCPUs >= NodeNumber) && Member (“MPICH”,other.GlueHostApplicationSoftwareRunTimeEnvironment) to the JDL requirements expression in order to find out the best resource where the job can be executed JDL (2/3)

7 Catania, Tutorial Grid per i Laboratori Nazionali del Sud, 26 Febbraio 2008 7 Executable specifies the MPI executable NodeNumber specifies the number of cores Arguments specifies the WN command line – Executable + Arguments form the command line on the WN mpi.pre.sh is a special script file that is sourced before launching MPI executable – warning: it runs only on the master node actual mpirun command is issued by the middleware (… what if a proprietary script/bin?) mpi.pre.sh is a special script file that is sourced after MPI executable termination – warning: it runs only on the master node JDL (3/3)

8 Catania, Tutorial Grid per i Laboratori Nazionali del Sud, 26 Febbraio 2008 8 In order to assure that a MPI job can run, the following requirements MUST BE satisfied: MPICH/MPICH2/MVAPICH/MVAPICH2 – the MPICH/MPICH2/MVAPICH/MVAPICH2 software must be installed and placed in the PATH environment variable, on all the WNs of the CE – some MPI applications require a file system shared among the WNs:  no shared area currently available to write user data  application may access the area of the master node (requires modifications to the application)  middleware solutions are also possible (as soon as required/designed/tested/deployed) Requirements (1/2)

9 Catania, Tutorial Grid per i Laboratori Nazionali del Sud, 26 Febbraio 2008 9 Requirements (2/2) Job wrapper copies all the files indicated in the InputSandbox on ALL of the “slave” nodes host based ssh authentication MUST BE well configured between all the WNs If some environment variables are needed ONLY on the “master” node, they can be set by the mpi.pre.sh If some environment variables are needed ON ALL THE NODES, a static installation is currently required (middleware extension is under consideration)

10 Catania, Tutorial Grid per i Laboratori Nazionali del Sud, 26 Febbraio 2008 10 [ Type = "Job"; JobType = "MPICH"; Executable = “MPIparallel_exec”; NodeNumber = 2; Arguments = “arg1 arg2 arg3"; StdOutput = "test.out"; StdError = "test.err"; InputSandbox = {“mpi.pre.sh”,“mpi.post.sh”, “MPIparallel_exec”}; OutputSandbox = {“test.err”, “test.out”, “executable.out”}; Requirements = other.GlueCEInfoLRMSType == "PBS" || other.GlueCEInfoLRMSType == "LSF"; ] mpi.jdl Local Resource Manager (LRMS) = PBS/LSF only Local Resource Manager (LRMS) = PBS/LSF only Pre e Post Processing Scripts Executable

11 Catania, Tutorial Grid per i Laboratori Nazionali del Sud, 26 Febbraio 2008 11 GigaBit vs InfiniBand The advantage of using a low – latency network becomes more evident the greater the number of nodes

12 Catania, Tutorial Grid per i Laboratori Nazionali del Sud, 26 Febbraio 2008 12 CPI Test (1/4) [marcello@infn-ui-01 mpi-0.13]$ edg-job-submit mpi.jdl Selected Virtual Organisation name (from proxy certificate extension): cometa Connecting to host infn-rb-01.ct.pi2s2.it, port 7772 Logging to host infn-rb-01.ct.trigrid.it, port 9002 ********************************************************************************************* JOB SUBMIT OUTCOME The job has been successfully submitted to the Network Server. Use edg-job-status command to check job current status. Your job identifier (edg_jobId) is: - https://infn-rb-01.ct.pi2s2.it:9000/vYGU1UUfRnSktGODcwEjMw *********************************************************************************************

13 Catania, Tutorial Grid per i Laboratori Nazionali del Sud, 26 Febbraio 2008 13 CPI Test (2/4) [marcello@infn-ui-01 mpi-0.13]$ edg-job-status https://infn-rb- 01.ct.pi2s2.it:9000/vYGU1UUfRnSktGODcwEjMw ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : https://infn-rb- 01.ct.pi2s2.it:9000/vYGU1UUfRnSktGODcwEjMw Current Status: Done (Success) Exit code: 0 Status Reason: Job terminated successfully Destination: infn-ce-01.ct.pi2s2.it:2119/jobmanager-lcglsf-short reached on: Sun Jul 1 15:08:11 2007 *************************************************************

14 Catania, Tutorial Grid per i Laboratori Nazionali del Sud, 26 Febbraio 2008 14 CPI Test (3/4) [marcello@infn-ui-01 mpi-0.13]$ edg-job-get-output --dir /home/marcello/JobOutput/ https://infn-rb- 01.ct.pi2s2.it:9000/vYGU1UUfRnSktGODcwEjMw Retrieving files from host: infn-rb-01.ct.pi2s2.it ( for https://infn-rb- 01.ct.pi2s2.it:9000/vYGU1UUfRnSktGODcwEjMw ) ********************************************************************************* JOB GET OUTPUT OUTCOME Output sandbox files for the job: - https://infn-rb-01.ct.pi2s2.it:9000/vYGU1UUfRnSktGODcwEjMw have been successfully retrieved and stored in the directory: /home/marcello/JobOutput/marcello_vYGU1UUfRnSktGODcwEjMw *********************************************************************************

15 Catania, Tutorial Grid per i Laboratori Nazionali del Sud, 26 Febbraio 2008 15 CPI Test (4/4) [marcello@infn-ui-01 mpi-0.13]$ cat /home/marcello/JobOutput/marcello_vYGU1UUfRnSktGODcwEjMw/test.out preprocessing script ------------------------- infn-wn-01.ct.pi2s2.it Process 0 of 4 on infn-wn-01.ct.pi2s2.it pi is approximately 3.1415926544231239, Error is 0.0000000008333307 wall clock time = 10.002570 Process 1 of 4 on infn-wn-01.ct.pi2s2.it Process 3 of 4 on infn-wn-02.ct.pi2s2.it Process 2 of 4 on infn-wn-02.ct.pi2s2.it TID HOST_NAME COMMAND_LINE STATUS TERMINATION_TIME ==== ========== ================ ======================= =================== 0001 infn-wn-01 /opt/lsf/6.1/lin Done 07/01/2007 17:04:23 0002 infn-wn-01 /opt/lsf/6.1/lin Done 07/01/2007 17:04:23 0003 infn-wn-02 /opt/lsf/6.1/lin Done 07/01/2007 17:04:23 0004 infn-wn-02 /opt/lsf/6.1/lin Done 07/01/2007 17:04:23 P4 procgroup file is /home/cometa005/.lsf_6826_genmpi_pifile. postprocessing script temporary [marcello@infn-ui-01 mpi-0.13]$

16 Catania, Tutorial Grid per i Laboratori Nazionali del Sud, 26 Febbraio 2008 16 MPI on the web.. https://edms.cern.ch/file/454439/LCG-2-UserGuide.pdf http://oscinfo.osc.edu/training/ http://www.netlib.org/mpi/index.html http://www-unix.mcs.anl.gov/mpi/learning.html http://www.ncsa.uiuc.edu/UserInfo/Training

17 Catania, Tutorial Grid per i Laboratori Nazionali del Sud, 26 Febbraio 2008 17 Questions…


Download ppt "FESR Consorzio COMETA - Progetto PI2S2 Using MPI to run parallel jobs on the Grid Marcello Iacono Manno Consorzio COMETA"

Similar presentations


Ads by Google