Presentation is loading. Please wait.

Presentation is loading. Please wait.

Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE www.egi.eu EGI-InSPIRE RI-261323 MPI and Parallel Code Support Alessandro Costantini, Isabel Campos, Enol.

Similar presentations


Presentation on theme: "Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE www.egi.eu EGI-InSPIRE RI-261323 MPI and Parallel Code Support Alessandro Costantini, Isabel Campos, Enol."— Presentation transcript:

1 www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE www.egi.eu EGI-InSPIRE RI-261323 MPI and Parallel Code Support Alessandro Costantini, Isabel Campos, Enol Fernández, Antonio Laganà, John Walsh 2/29/2016 1

2 www.egi.eu EGI-InSPIRE RI-261323 Core Objectives Improved end-user documentation, addressing MPI application development and job submission in ARC, gLite and UNICORE, Quality controlled MPI site deployment documentation, Outreach and dissemination at major EGI events and workshops, User community, NGI and site engagement, gathering direct input, Participation in selected standardisation bodies.

3 www.egi.eu EGI-InSPIRE RI-261323 MPI Application Libraries Most Common Application Libraries DL_POLY NAMD VENUS96 GROMACS GAMESS MUFTE mpiBLAST Some app libs tightly tied to specific MPI libraries (and may need local re-compile)

4 www.egi.eu EGI-InSPIRE RI-261323 MPI Status 119 clusters publish MPI-START tag Very little change since last year However, big change in reliability! Sites now tested every hour via SAM NGIs/Sites must follow-up on MPI failures CompChem VO performed wide scale testing Uses larger number of cpu cores than SAM. Uses UNIPG production codes of DL_POLY 16 sites of 25 support both CompChem and MPI Tested sequentially on one node, then parallel on 2, 4, 8, 16, 32, 64 nodes 2/29/2016 4

5 www.egi.eu EGI-InSPIRE RI-261323 - Performance of the sites were obtained by running a DL_POLY test case  Developed by Daresbury Laboratory  Developed for MD calculations  Native parallel (SPMD schema, Replicated Data strategy)‏  Compiled using IFC, MKL, MPICH (static compiled on the UI)‏ - The calculation ran sequentially on one node and in parallel on 2, 4, 8, 16, 32, 64 nodes - Related performances and statistics were evaluated DL_POLY as MPI test case

6 www.egi.eu EGI-InSPIRE RI-261323 - SAM-MPI tests enabled -Parallel applications run properly on 12 sites up to 16 CPUs -Clearly a need to isolate outstanding issues! Performances over the Grid

7 www.egi.eu EGI-InSPIRE RI-261323 Success rates and issues Job Status (Percent) 200920102011 Successful537580 Unsuccessful472520 Job Status (Percent) 200920102011 Successful215462 Unsuccessful794638 Unsuccessful200920102011 Aborted by CE52080 Scheduler Error 3910020 MPI-START90 Unsuccessful200920102011 Aborted by CE73050 Scheduler Error 23930 Proxy Expired4750 16 to 64 Cores2 to 8 Cores

8 www.egi.eu EGI-InSPIRE RI-261323 OpenMP/User defined allocation support New M/W features requested by users OpenMP support added to MPI-START User defined allocation of processes/node Accounting issues being investigated Expected release in UMD 1.3 OpenMP advantages Most sites now use >= 4 cores per machine OpenMP is lightweight, easy to use, quick

9 www.egi.eu EGI-InSPIRE RI-261323 GPGPU support GPGPU now a commodity product High-end units offer ECC/better precision CUDA/OpenCL features improving Especially double precision calculations Large scale growth at HPC centres As seen in HPC Top 500 Large set of Applications Across all scientific domains OpenMPI support (must be “compiled in”)

10 www.egi.eu EGI-InSPIRE RI-261323 GPGPU - Caveat Emptor CUDA/OpenCL has steep learning curve All user a/c have R/W access to resource Most nodes now MultiCore N job slots per physical machine Distinct pool a/c may try to access same GPU User code needs guaranteed exclusive access GPGPU resource schedulers Basic support in Torque 2.5.4 No support in MAUI (MOAB yes) SLURM supports GPGPU resources

11 www.egi.eu EGI-InSPIRE RI-261323 Current GPU Scheduling Solutions Innovative mixed Grid/Cloud approach Need Grid standardisation (#GPU cores etc) Exploits new features in: WMS + H/W virtualization + PCI pass- through of GPU to VM Single CPU job slot to handle LRMS GPU resource allocation Used by CompChem and TheoPhys VOs may (in the future ) be able to deploy own MPI apps as VMs

12 www.egi.eu EGI-InSPIRE RI-261323 Conclusions SAM-MPI tests solve usual site problems Easier to detect source of failure MPI now more reliable for large jobs Waiting times at sites can be prohibitive Works best when as many free nodes at site as job size. Need wider deployment of UMD WMS 3.3 Improved generic parallel job support Exciting time ahead with GPGPU/Virtualisation


Download ppt "Www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE www.egi.eu EGI-InSPIRE RI-261323 MPI and Parallel Code Support Alessandro Costantini, Isabel Campos, Enol."

Similar presentations


Ads by Google