MPI CUSTOMIZATION IN ROMA3 SITE Antonio Budano Federico Bitelli.

Slides:



Advertisements
Similar presentations
Andrew McNab - Manchester HEP - 17 September 2002 Putting Existing Farms on the Testbed Manchester DZero/Atlas and BaBar farms are available via the Testbed.
Advertisements

Clusters, SubClusters and Queues A Spotters Guide Chris Brew HepSysMan 06/11/2008.
Bob Jones – Project Architecture - 1 March n° 1 Information & Monitoring Services Antony Wilson WP3, RAL
12th EELA TUTORIAL - USERS AND SYSTEM ADMINISTRATOR E-infrastructure shared between Europe and Latin America CE + WN installation and configuration.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Instalación y configuración de CE+WN Alicia Acero Fernández.
Instalación y configuración de CE+WN Angelines Alberto CIEMAT Grid Tutorial, Sept
WMS+LB Server Installation and Configuration Carlos Fuentes Bermejo IRIS-CERT/RedIRIS 11th EELA Tutorial, Madrid de Septiembre de 2007.
By Fletcher Liverance For Dr. Jin, CS49995 February 5 th 2012.
HTCondor and the European Grid Andrew Lahiff STFC Rutherford Appleton Laboratory European HTCondor Site Admins Meeting 2014.
Using the Argo Cluster Paul Sexton CS 566 February 6, 2006.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Parallel execution of chemical software on.
1 Kolkata, Asia Joint CHAIN/EU-IndiaGrid2/EPIKH School for Grid Site Administrators, The EPIKH Project (Exchange Programme.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Supporting MPI applications on the EGEE Grid.
High Throughput Parallel Computing (HTPC) Dan Fraser, UChicago Greg Thain, Uwisc.
1 Worker Nodes Installation&Configuration Sara Bertocco INFN Padova 11 th International GridKa School 2013 – Big Data, Clouds and Grids.
Electronic Visualization Laboratory, University of Illinois at Chicago MPI on Argo-new Venkatram Vishwanath Electronic Visualization.
SA1 / Operation & support Enabling Grids for E-sciencE Integration of heterogeneous computational resources in.
INFSO-RI Enabling Grids for E-sciencE Status of LCG-2 porting Stephen Childs, Brian Coghlan and Eamonn Kenny Grid-Ireland/EGEE October.
Enabling Grids for E-sciencE SGE J. Lopez, A. Simon, E. Freire, G. Borges, K. M. Sephton All Hands Meeting Barcelona, Spain 23 May 2007.
:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: GridKA School 2009 MPI on Grids 1 MPI On Grids September 3 rd, GridKA School 2009.
Common Practices for Managing Small HPC Clusters Supercomputing 12
SEE-GRID-2 The SEE-GRID-2 initiative is co-funded by the European Commission under the FP6 Research Infrastructures contract no
Grid job submission using HTCondor Andrew Lahiff.
9th EELA TUTORIAL - USERS AND SYSTEM ADMINISTRATORS E-infrastructure shared between Europe and Latin America CE + WN installation and configuration.
INFSO-RI Enabling Grids for E-sciencE Strategy for gLite multi-platform support Author:Eamonn Kenny Meeting:SA3 All Hands Meeting.
CERN Using the SAM framework for the CMS specific tests Andrea Sciabà System Analysis WG Meeting 15 November, 2007.
London Tier 2 Status Report GridPP 11, Liverpool, 15 September 2004 Ben Waugh on behalf of Owen Maroney.
Certification and test activity IT ROC/CIC Deployment Team LCG WorkShop on Operations, CERN 2-4 Nov
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America BDII Server Installation and Configuration Antonio Juan.
BDII Server Installation and Configuration Manuel Rubio del Solar Extremadura Advanced Research Center (CETA-CIEMAT) 11th EELA Tutorial for Users Sevilla,
4th EELA TUTORIAL - USERS AND SYSTEM ADMINISTRATORS E-infrastructure shared between Europe and Latin America CE + WN installation and configuration.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Batch Systems and the Info (Dynamic) Provider.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America BDII Server Installation and Configuration.
University of Bristol 5th GridPP Collaboration Meeting 16/17 September, 2002Owen Maroney University of Bristol 1 Testbed Site –EDG 1.2 –LCFG GridPP Replica.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Worker Node installation & configuration.
CE: compute element TP: CE & WN Compute Element Worker Node Installation configuration.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid2Win : gLite for Microsoft Windows Roberto.
Certification and test activity ROC/CIC Deployment Team EGEE-SA1 Conference, CNAF – Bologna 05 Oct
Ninth EELA Tutorial for Users and Managers E-infrastructure shared between Europe and Latin America BDII Server Installation Yubiryn Ramírez.
Site-BDII Installation&Configuration Sara Bertocco INFN Padova 11 th International GridKa School 2013 – Big Data, Clouds and Grids.
A step towards interoperability (between Int.EU.Grid and EGEE Grid infrastructures) Gonçalo Borges, Jorge Gomes LIP on behalf of Int.EU.Grid Collaboration.
MPI WG Proceedings Jeroen Engelberts SARA Reken- en Netwerkdiensten Amsterdam, The Netherlands.
Third EELA Tutorial for Managers and Users E-infrastructure shared between Europe and Latin America CE + WN installation and configuration.
CERN Running a LCG-2 Site – Oxford July - 1 LCG2 Administrator’s Course Oxford University, 19 th – 21 st July Developed.
Open Science Grid Build a Grid Session Siddhartha E.S University of Florida.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI MPI and Parallel Code Support Alessandro Costantini, Isabel Campos, Enol.
GLite WN Installation Giuseppe LA ROCCA INFN Catania ACGRID-II School 2-14 November 2009 Kuala Lumpur - Malaysia.
II EGEE conference Den Haag November, ROC-CIC status in Italy
– n° 1 Grid di produzione INFN – GRID Cristina Vistoli INFN-CNAF Bologna Workshop di INFN-Grid ottobre 2004 Bari.
Consorzio COMETA - Progetto PI2S2 UNIONE EUROPEA Grid2Win : gLite for Microsoft Windows Elisa Ingrà - INFN.
First South Africa Grid Training WORKER NODE Albert van Eck University of the Free State 25 July, 2008.
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Management Claudio Grandi.
INFN/IGI contributions Federated Clouds Task Force F2F meeting November 24, 2011, Amsterdam.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarksEGEE-III INFSO-RI MPI on the grid:
WORKER NODE Alfonso Pardo EPIKH School, System Admin Tutorial Beijing, 2010 August 30th – 2010 September 3th.
INFSO-RI Enabling Grids for E-sciencE Worker Node installation & configuration Giuseppe Platania INFN Catania EMBRACE Tutorial Clermont-Ferrand,
INFSO-RI Enabling Grids for E-sciencE Padova site report Massimo Sgaravatto On behalf of the JRA1 IT-CZ Padova group.
CNAF - 24 September 2004 EGEE SA-1 SPACI Activity Italo Epicoco.
South African Grid Training WORKER NODE Albert van Eck UFS - ICTS 17 November 2009 Slides by GIUSEPPE PLATANIA.
CREAM Status and plans Massimo Sgaravatto – INFN Padova
Virtual machines ALICE 2 Experience and use cases Services at CERN Worker nodes at sites – CNAF – GSI Site services (VoBoxes)
YAIM Optimized Cristina Aiftimiei – Sergio Traldi
Stephen Childs Trinity College Dublin
lcg-infosites documentation (v2.1, LCG2.3.1) 10/03/05
Moving from CREAM CE to ARC CE
ETICS Services Management
Discussions on group meeting
Grid Management Challenge - M. Jouvin
Presentation transcript:

MPI CUSTOMIZATION IN ROMA3 SITE Antonio Budano Federico Bitelli

MPI in ROMA3 Our CE is a Cream CE and is used also to manage local queue (job submitted with pbs –q ) Worker nodes are essentially of two types 16 blades of 8 cores on a HP system 8 blades of 16+ cores on SuperMicro system equipped with Infiniband Pbs nodefile so is composed of lines similar to wn cluster.roma3 np=8 lcgpro wn cluster.roma3 np=16 lcgpro infiniband Goal : Each mpi job must go to Infiniband nodes Local MPI Jobs shoulds exatcly meets users requirement (eg #PBS -l nodes=3:ppn=6) Publish in grid : MPI-Infiniband MPI-START MPICH2 MPICH2-1.6 OPENMPI OPENMPI MPICH1

Local Jobs We had the problem that maui/pbs did not meet the pbs jobs requirement When users asked (#PBS -l nodes=3:ppn=6) System gave him just maximum avalaible slot on a single WN (so 16 in our case) We fixed this upgrading Maui to maui and pbs to version (on worker nodes too!!) We made the upgrades configuring and compiling both from tar.gz files

Grid Jobs We just configured Torque Client in the CE (used to submit grid jobs) to use submit filter ~]# cat /var/spool/torque/torque.cfg SUBMITFILTER /var/spool/pbs/submit_filter

MPI INFINIBAND To route MPI Jobs to WorkerNodes with Infiniband We edited (on CE) /opt/glite/bin/pbs_submit.sh and we added the line [ -z "$bls_opt_mpinodes" ] || echo "#PBS -q mpi_ib" >> $bls_tmp_file That line routes MPI jobs to the queue mpi_ib And we told torque that each job in this queue must go to Infiniband Nodes set queue mpi_ib resources_default.neednodes = infiniband

MPISTART PROBLEM we wanted to use and publish our version of MPICH2 (compiled for Infiniband) MPI-START MPICH2 MPICH2-1.6 To do that official manual says you should edit (on the WNs ) the files /etc/profile.d/mpi_grid_vars.sh (& /etc/profile.d/mpi_grid_vars.csh ) and add export MPI_MPICH2_MPIEXEC=/usr/mpi/gcc/mvapich2-1.6/bin/mpiexec export MPI_MPICH2_PATH=/usr/mpi/gcc/mvapich2-1.6/ export MPI_MPICH2_VERSION=1.6 and similar in /etc/profile.d/grid-env.sh But jobs could not start After some days of troubleshoting we saw that the problem was in i2g-mpi-start package In particular the file /opt/i2g/etc/mpi-start/mpich2.mpi In this files there are some bugs The corrected version will be as soon as possible in WIKI pages

BDII Configuration Remember to publish information into the Bdii So on CE edit /opt/glite/etc/gip/ldif/static-file-Cluster.ldif and add properly GlueHostApplicationSoftwareRunTimeEnvironment: MPI-START GlueHostApplicationSoftwareRunTimeEnvironment: MPICH2 GlueHostApplicationSoftwareRunTimeEnvironment: MPICH2-1.6 GlueHostApplicationSoftwareRunTimeEnvironment: OPENMPI GlueHostApplicationSoftwareRunTimeEnvironment: OPENMPI GlueHostApplicationSoftwareRunTimeEnvironment: MPI-Infiniband Then /etc/init.d/bdii restart

MPI STATUS bin]# ldapsearch -xLLL -h egee-bdii.cnaf.infn.it:2170 -b o=grid '(&(objectClass=GlueHostApplicationSoftware)(GlueSubClusterUniqueID='ce-02.roma3.infn.it'))' GlueHostApplicationSoftwareRunTimeEnvironment |grep MPI GlueHostApplicationSoftwareRunTimeEnvironment: MPI-Infiniband GlueHostApplicationSoftwareRunTimeEnvironment: MPI-START GlueHostApplicationSoftwareRunTimeEnvironment: MPICH2 GlueHostApplicationSoftwareRunTimeEnvironment: MPICH2-1.6 GlueHostApplicationSoftwareRunTimeEnvironment: OPENMPI GlueHostApplicationSoftwareRunTimeEnvironment: OPENMPI-1.4.3