New Compute-Cluster at CMS

Slides:

Advertisements

Similar presentations

CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding or

Advertisements

QMUL e-Science Research Cluster Introduction (New) Hardware Performance Software Infrastucture What still needs to be done.

Research Computing The Apollo HPC Cluster

EU 2nd Year Review – Jan – Title – n° 1 WP1 Speaker name (Speaker function and WP ) Presentation address e.g.

NERCS Users’ Group, Oct. 3, 2005 Interconnect and MPI Bill Saphir.

Intel® Manager for Lustre* Lustre Installation & Configuration

Parallel ISDS Chris Hans 29 November 2004.

Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.

Southgreen HPC system Concepts Cluster : compute farm i.e. a collection of compute servers that can be shared and accessed through a single “portal”

Software Tools Using PBS. Software tools Portland compilers pgf77 pgf90 pghpf pgcc pgCC Portland debugger GNU compilers g77 gcc Intel ifort icc.

DCC/FCUP Grid Computing 1 Resource Management Systems.

HIGH PERFORMANCE COMPUTING ENVIRONMENT The High Performance Computing environment consists of high-end systems used for executing complex number crunching.

6/2/20071 Grid Computing Sun Grid Engine (SGE) Manoj Katwal.

Sun Grid Engine Grid Computing Assignment – Fall 2005 James Ruff Senior Department of Mathematics and Computer Science Western Carolina University.

High Performance Computing (HPC) at Center for Information Communication and Technology in UTM.

Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.

MIGRATING TO THE SHARED COMPUTING CLUSTER (SCC) SCV Staff Boston University Scientific Computing and Visualization.

Rocks cluster : a cluster oriented linux distribution or how to install a computer cluster in a day.

Research Computing with Newton Gerald Ragghianti Newton HPC workshop Sept. 3, 2010.

A Makeshift HPC (Test) Cluster Hardware Selection Our goal was low-cost cycles in a configuration that can be easily expanded using heterogeneous processors.

Introduction to HPC resources for BCB 660 Nirav Merchant

ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.

VIPBG LINUX CLUSTER By Helen Wang March 29th, 2013.

Using The Cluster. What We’ll Be Doing Add users Run Linpack Compile code Compute Node Management.

March 3rd, 2006 Chen Peng, Lilly System Biology1 Cluster and SGE.

Clusters at IIT KANPUR - 1 Brajesh Pande Computer Centre IIT Kanpur.

17-April-2007 High Performance Computing Basics April 17, 2007 Dr. David J. Haglin.

Lab System Environment

O.S.C.A.R. Cluster Installation. O.S.C.A.R O.S.C.A.R. Open Source Cluster Application Resource Latest Version: 2.2 ( March, 2003 )

Common Practices for Managing Small HPC Clusters Supercomputing 12

Introduction to Using SLURM on Discover Chongxun (Doris) Pan September 24, 2013.

HPC for Statistics Grad Students. A Cluster Not just a bunch of computers Linked CPUs managed by queuing software – Cluster – Node – CPU.

Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April

Roadrunner Supercluster University of New Mexico -- National Computational Science Alliance Paul Alsing.

1 High-Performance Grid Computing and Research Networking Presented by David Villegas Instructor: S. Masoud Sadjadi

Cluster Software Overview

Getting Started on Emerald Research Computing Group.

How to for compiling and running MPI Programs. Prepared by Kiriti Venkat.

CASPUR Site Report Andrei Maslennikov Lead - Systems Rome, April 2006.

Software Tools Using PBS. Software tools Portland compilers pgf77 pgf90 pghpf pgcc pgCC Portland debugger GNU compilers g77 gcc Intel ifort icc.

Cluster Computing Applications for Bioinformatics Thurs., Sept. 20, 2007 process management shell scripting Sun Grid Engine running parallel programs.

ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.

Intermediate Parallel Programming and Cluster Computing Workshop Oklahoma University, August 2010 Running, Using, and Maintaining a Cluster From a software.

Getting Started: XSEDE Comet Shahzeb Siddiqui - Software Systems Engineer Office: 222A Computer Building Institute of CyberScience May.

Cliff Addison University of Liverpool NW-GRID Training Event 26 th January 2007 SCore MPI Taking full advantage of GigE.

Willkommen Welcome Bienvenue How we work with users in a small environment Patrik Burkhalter.

NREL is a national laboratory of the U.S. Department of Energy, Office of Energy Efficiency and Renewable Energy, operated by the Alliance for Sustainable.

Requesting Resources on an HPC Facility Michael Griffiths and Deniz Savas Corporate Information and Computing Services The University of Sheffield

CFI 2004 UW A quick overview with lots of time for Q&A and exploration.

Grid Computing: An Overview and Tutorial Kenny Daily BIT Presentation 22/09/2016.

An Brief Introduction Charlie Taylor Associate Director, Research Computing UF Research Computing.

Advanced Computing Facility Introduction

Specialized Computing Cluster An Introduction

Auburn University

Welcome to Indiana University Clusters

PARADOX Cluster job management

HPC usage and software packages

Cluster / Grid Status Update

Using Paraguin to Create Parallel Programs

BIOSTAT LINUX CLUSTER By Helen Wang October 29, 2015.

Hodor HPC Cluster LON MNG HPN Head Node Comp Node Comp Node Comp Node

Architecture & System Overview

CommLab PC Cluster (Ubuntu OS version)

BIMSB Bioinformatics Coordination

HPCC New User Training Getting Started on HPCC Resources

Welcome to our Nuclear Physics Computing System

Welcome to our Nuclear Physics Computing System

Requesting Resources on an HPC Facility

Introduction to High Performance Computing Using Sapelo2 at GACRC

Working in The IITJ HPC System

Presentation transcript:

New Compute-Cluster at CMS Daniela-Maria Pusinelli Compute-Cluster - 1 Jahreskolloquium 29.06.2010 Daniela-Maria Pusinelli

Agenda Construction and Management High Avaibility How to use Software Batchsystem Compute-Cluster - 2 Jahreskolloquium 29.06.2010 Daniela-Maria Pusinelli

8 Supermicro Twin2-Server Construction and Management/1 Where to find: Grimm-Zentrum, Server Room, water cooled cabinet Aufbau: - 8 Supermicro Twin2-Server, 2 U - every has 4 nodes with 2 Intel-Nehalem CPUs - every has 4 Cores, Infiniband QDR, 48 GB memory - at all 32 nodes, 256 Cores, 1.536 TB memory 8 Supermicro Twin2-Server Compute-Cluster - 3 Jahreskolloquium 29.06.2010 Daniela-Maria Pusinelli

Construction and Management/2 two master nodes are responsible for login to the address: clou.cms.hu-berlin.de they manage different services (DHCP, NFS, LDAP, Mail) they start und stop alle nodes if necessary, only they communicate into the Universtiy network they provide development software (Compiler, MPI) and also application software they organize the batch service Compute-Cluster - 4 Jahreskolloquium 29.06.2010 Daniela-Maria Pusinelli

Construction and Management/3 two as a HA cluster configured server provide the parallel file system (Lustre FS) for temporary data of large size lulu:/work all nodes are equipped with fast infiniband network (QDR) which is responsible for node communication during parallel computations the master and file server nodes are monitored by central Nagios of CMS the compute server are monitored local with Ganglia Compute-Cluster - 5 Jahreskolloquium 29.06.2010 Daniela-Maria Pusinelli

Aufbau und Cluster-Verwaltung/4 /work Data 1 MDT , 2 OSTs 3 OSTs Login clou.cms.hu-berlin.de Failover Login 36 Port Infiniband Switch node1 bis node4 node5 bis node8 node25 bis node28 node29 bis node32 - /home Data Stellt /work bereit Compute-Cluster - 6 Jahreskolloquium 29.06.2010 Daniela-Maria Pusinelli

all servers are equipped with redundant power supplies High Avaibility /1 all servers are equipped with redundant power supplies master server: RedHat Resource Group Manger rgmanager configures and monitors the services on both master nodes in case of failover of one master the other one takes over the services of the failed one file server: for the Lustre FS /work are also redundant configured, if one fails the other one takes the MDS (Meta Data Server) and also the OSTs (Object Storage Targets), possible becuse all data are stored on a virtual SAN disk Compute-Cluster - 7 Jahreskolloquium 29.06.2010 Daniela-Maria Pusinelli

How to use/1 all colleagues who have not enough resources at the institut are allowed to use the system necessyary is an account at CMS, the acoount will be opend for the service login on master clou.cms.hu-berlin.de with SSH form University network you may login to nodes via ssh from master node without further authentification Data storage: /home/<institut>/<account> unique user home dir /afs/.cms.hu-berlin.de/user/<account> OpenAFS home dir /work/<account> working dir /perm/<account> permanent dir /scratch/<account> on nodes lokal working dir - Migration of data from old cluster via /home Verzeichnis or with scp Kkk alle Compute-Cluster - 8 Jahreskolloquium 29.06.2010 Daniela-Maria Pusinelli

How to use/2 alle Kkk Data saving: data in /home daily into TSM, max. 3 older versions data in /afs/.cms.hu-berlin.de daily, the whole semester data in /perm daily on disk, max. 3 older versions data in /work no saving data in /scratch no saving /work and /scratch are controlled on achieve high water mark of 85%, data older than 1 month willl be removed important data should be copied to /perm or to a home dir a parallel SSH is installed, for calling commands on all nodes pssh --help pssh –P –h /usr/localshare/nodes/all w |grep load Kkk alle Compute-Cluster - 9 Jahreskolloquium 29.06.2010 Daniela-Maria Pusinelli

Software/1 alle Kkk System software Operating system: CentOS 5 = RedHat EL 5 Development Software - GNU Compiler Intel Compiler Suite mit MKL Portland Group Compiler OpenMPI MPICH Application software Chemie: Gaussian, Turbomole, ORCA Matlab, Maple further Software is possible Kkk alle Compute-Cluster - 10 Jahreskolloquium 29.06.2010 Daniela-Maria Pusinelli

Software/2 Kkk - all Software versions must be loaded with the module command module available -> all available modules (Software) - Development: module load intel-cluster-322 -> loads Intel Compiler+ module load openmpi-14-intel -> loads OpenMPI ... Application software: module load g09-a02 -> loads Gaussian09 A02 module load orca27 -> loads ORCA 2.7 module load matlab-10a -> loads Matlab R2010a alle Compute-Cluster - 11 Jahreskolloquium 29.06.2010 Daniela-Maria Pusinelli

Batch Service/1 alle Open Source Product: SUN Grid Engine (SGE) cell clous is installed on one master node the Qmaster daemon is running the other one is working as a slave and will get activ if the first fails - SGE supports parallel environments (PE) - There is a Grafical User Interface QMON With QMON all configurations may be shown and actions may be done The are two parallel environments installed: ompi (128 Slots) and smp (64 Slots) - These will be allocated to the parallel queues alle Compute-Cluster - 12 Jahreskolloquium 29.06.2010 Daniela-Maria Pusinelli

Batch Service/2 alle Kkk parallel jobs upto 32 cores are allowed, max. 4 Jobs a 32 Cores at the same time in bigpar in queue par there are upto 8 jobs a 8 cores allowed, all jobs are running on a separate node - ser and long are serial queues - inter is for interactive computations (Matlab, GaussView) all values and configurations are preliminary and may be changed on user requirements batch scripts of old cluster can’t be used for the new one, because of new batch system Kkk alle Compute-Cluster - 13 Jahreskolloquium 29.06.2010 Daniela-Maria Pusinelli

Batch Service/3 alle Kkk Queue Priority Prozessors Memory Slots PE Runtime bigpar +5 8-32 40 GB 128 ompi 48 h par 4-8 20 GB 64 smp 24 h long -5 1 4 GB 32 - 96 h short +10 6 h inter +15 1 GB 8 3 h alle Compute-Cluster - 14 Jahreskolloquium 29.06.2010 Daniela-Maria Pusinelli

Batch Service/4 alle Kkk Users are collected to lists = working groups e.d.: cms, limberg, haerdle, ... submition of jobs: example scripts for all applications are in /perm/skripte, they include the module calls e.d. Gaussian09 computation: cd /work/$USER/g09 cp /perm/skripte/g09/run_g09_smp_8 -> ev. Änderungen qsub run_g09_smp_8 qstat qmon& -> Grafical User Interface alle Compute-Cluster - 15 Jahreskolloquium 29.06.2010 Daniela-Maria Pusinelli

Batch Service/5 alle Kkk - MPI program developing and starting (computation of Pi) cd /work/$USER/ompi module load intel-cluster-322 module load openmpi-14-intel cp /perm/skripte/ompi/cpi.c . mpicc –o cpi cpi.c cp /perm/skripte/ompi/run_ompi_32 . qsub run_ompi_32 Enthält den Aufruf des MPI Programmes mpirun –np 8 –machinefile nodefile cpi cat nodefile -> enhält den Knoten, z.B. node8.local slots=8 alle Compute-Cluster - 16 Jahreskolloquium 29.06.2010 Daniela-Maria Pusinelli

Batch Service/6 alle Kkk - important commands qstat -> Status der eigenen Jobs qstat –u \* -> Liste aller Jobs im Cluster qdel <jobid> -> Entfernen des Jobs qconf –sql -> Liste aller Queues qconf –sq par -> Konfiguration Queue par qconf –sul -> zeigt die Userlisten (Gruppen) qconf –su cms -> User der Liste/Gruppe cms qconf –spl -> Liste der Parallel Environments qacct <jobid> -> Abrechnungsinfo zum Job qstat –q bigpar –f -> zeigt, ob Queue disabled alle Compute-Cluster - 17 Jahreskolloquium 29.06.2010 Daniela-Maria Pusinelli