New Compute-Cluster at CMS Daniela-Maria Pusinelli Compute-Cluster - 1Jahreskolloquium 29.06.2010 Daniela-Maria Pusinelli.

2 Agenda 1.Construction and Management 2.High Avaibility 3.How to use 4.Software 5.Batchsystem Compute-Cluster - 2Jahreskolloquium Daniela-Maria Pusinelli

3 1. Construction and Management/1 8 Supermicro Twin2-Server Where to find:Grimm-Zentrum, Server Room, water cooled cabinet Aufbau:- 8 Supermicro Twin2-Server, 2 U - every has 4 nodes with 2 Intel-Nehalem CPUs - every has 4 Cores, Infiniband QDR, 48 GB memory - at all 32 nodes, 256 Cores, TB memory Compute-Cluster - 3Jahreskolloquium Daniela-Maria Pusinelli

4 - two master nodes are responsible for login to the address: - they manage different services (DHCP, NFS, LDAP, Mail) - they start und stop alle nodes if necessary, only they communicate into the Universtiy network - they provide development software (Compiler, MPI) - and also application software - they organize the batch service 1. Construction and Management/2 Compute-Cluster - 4Jahreskolloquium Daniela-Maria Pusinelli

5 - two as a HA cluster configured server provide the parallel file system (Lustre FS) for temporary data of large size lulu:/work - all nodes are equipped with fast infiniband network (QDR) which is responsible for node communication during parallel computations - the master and file server nodes are monitored by central Nagios of CMS - the compute server are monitored local with Ganglia 1. Construction and Management/3 Compute-Cluster - 5Jahreskolloquium Daniela-Maria Pusinelli

6 1. Aufbau und Cluster-Verwaltung/4 Compute-Cluster - 6Jahreskolloquium Daniela-Maria Pusinelli Stellt /work bereit -/work Data - 1 MDT, 2 OSTs - 3 OSTs - Login - Failover Login - 36 Port Infiniband Switch - node1 bis node4 - node5 bis node8 - node25 bis node28 - node29 bis node32 - /home Data

7 - all servers are equipped with redundant power supplies - master server: RedHat Resource Group Manger rgmanager configures and monitors the services on both master nodes - in case of failover of one master the other one takes over the services of the failed one - file server: for the Lustre FS /work are also redundant configured, if one fails the other one takes the MDS (Meta Data Server) and also the OSTs (Object Storage Targets), possible becuse all data are stored on a virtual SAN disk Compute-Cluster - 7Jahreskolloquium Daniela-Maria Pusinelli 2.High Avaibility /1

8 3.How to use/1 Compute-Cluster - 8Jahreskolloquium Daniela-Maria Pusinelli alle Kkk - all colleagues who have not enough resources at the institut are allowed to use the system - necessyary is an account at CMS, the acoount will be opend for the service - login on master with SSH form University network - you may login to nodes via ssh from master node without further authentification - Data storage: -/home/ / unique user home dir -/afs/ OpenAFS home dir -/work/ working dir -/perm/ permanent dir -/scratch/ on nodes lokal working dir - Migration of data from old cluster via /home Verzeichnis or with scp

9 3.How to use/2 Compute-Cluster - 9Jahreskolloquium Daniela-Maria Pusinelli alle Kkk Data saving: - data in /home daily into TSM, max. 3 older versions - data in /afs/ daily, the whole semester - data in /permdaily on disk, max. 3 older versions - data in /workno saving - data in /scratchno saving - /work and /scratch are controlled on achieve high water mark of 85%, data older than 1 month willl be removed - important data should be copied to /perm or to a home dir - a parallel SSH is installed, for calling commands on all nodes pssh --help pssh –P –h /usr/localshare/nodes/all w |grep load

10 4.Software/1 Compute-Cluster - 10Jahreskolloquium Daniela-Maria Pusinelli alle Kkk System software - Operating system:CentOS 5 = RedHat EL 5 Development Software - GNU Compiler - Intel Compiler Suite mit MKL - Portland Group Compiler - OpenMPI - MPICH Application software - Chemie:Gaussian, Turbomole, ORCA - Matlab, Maple - further Software is possible

11 4.Software/2 Compute-Cluster - 11Jahreskolloquium Daniela-Maria Pusinelli alle Kkk - all Software versions must be loaded with the module command module available -> all available modules (Software) - Development: module load intel-cluster-322 -> loads Intel Compiler+ module load openmpi-14-intel -> loads OpenMPI... - Application software: module load g09-a02 -> loads Gaussian09 A02 module load orca27 -> loads ORCA 2.7 module load matlab-10a -> loads Matlab R2010a...

12 5.Batch Service/1 Compute-Cluster - 12Jahreskolloquium Daniela-Maria Pusinelli alle Open Source Product: SUN Grid Engine (SGE) - cell clous is installed - on one master node the Qmaster daemon is running the other one is working as a slave and will get activ if the first fails - SGE supports parallel environments (PE) - There is a Grafical User Interface QMON - With QMON all configurations may be shown and actions may be done - The are two parallel environments installed: ompi (128 Slots) and smp (64 Slots) - These will be allocated to the parallel queues

13 5.Batch Service/2 Compute-Cluster - 13Jahreskolloquium Daniela-Maria Pusinelli alle Kkk - parallel jobs upto 32 cores are allowed, max. 4 Jobs a 32 Cores at the same time in bigpar - in queue par there are upto 8 jobs a 8 cores allowed, all jobs are running on a separate node - ser and long are serial queues - inter is for interactive computations (Matlab, GaussView) - all values and configurations are preliminary and may be changed on user requirements - batch scripts of old cluster cant be used for the new one, because of new batch system

14 5.Batch Service/3 Compute-Cluster - 14Jahreskolloquium Daniela-Maria Pusinelli alle Kkk QueuePriorityProzessorsMemorySlotsPERuntime bigpar GB128ompi48 h par GB64smp24 h long-514 GB32-96 h short+1014 GB32-6 h inter+1511 GB8-3 h

15 5.Batch Service/4 Compute-Cluster - 15Jahreskolloquium Daniela-Maria Pusinelli alle Kkk - Users are collected to lists = working groups e.d.: cms, limberg, haerdle,... - submition of jobs: example scripts for all applications are in /perm/skripte, they include the module calls - e.d. Gaussian09 computation: cd /work/$USER/g09 cp /perm/skripte/g09/run_g09_smp_8 -> ev. Änderungen qsub run_g09_smp_8 qstat qmon& -> Grafical User Interface

16 5.Batch Service/5 Compute-Cluster - 16Jahreskolloquium Daniela-Maria Pusinelli alle Kkk - MPI program developing and starting (computation of Pi) cd /work/$USER/ompi module load intel-cluster-322 module load openmpi-14-intel cp /perm/skripte/ompi/cpi.c. mpicc –o cpi cpi.c cp /perm/skripte/ompi/run_ompi_32. qsub run_ompi_32 Enthält den Aufruf des MPI Programmes mpirun –np 8 –machinefile nodefile cpi cat nodefile -> enhält den Knoten, z.B. node8.local slots=8

17 5.Batch Service/6 Compute-Cluster - 17Jahreskolloquium Daniela-Maria Pusinelli alle Kkk - important commands qstat -> Status der eigenen Jobs qstat –u \* -> Liste aller Jobs im Cluster qdel -> Entfernen des Jobs qconf –sql -> Liste aller Queues qconf –sq par -> Konfiguration Queue par qconf –sul -> zeigt die Userlisten (Gruppen) qconf –su cms -> User der Liste/Gruppe cms qconf –spl -> Liste der Parallel Environments qacct -> Abrechnungsinfo zum Job qstat –q bigpar –f -> zeigt, ob Queue disabled

