Research Computing The Apollo HPC Cluster

Slides:

Advertisements

Similar presentations

QMUL e-Science Research Cluster Introduction (New) Hardware Performance Software Infrastucture What still needs to be done.

Advertisements

NGS computation services: API's,

Computing Infrastructure

ITS Training and Awareness Session Research Support Jeremy Maris & Tom Armour ITS

CURRENT AND FUTURE HPC SOLUTIONS. T-PLATFORMS  Russia’s leading developer of turn-key solutions for supercomputing  Privately owned  140+ employees.

Publishing applications on the web via the Easa Portal and integrating the Sun Grid Engine Publishing applications on the web via the Easa Portal and integrating.

Chapter 5: Server Hardware and Availability. Hardware Reliability and LAN The more reliable a component, the more expensive it is. Server hardware is.

Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.

Presented by: Yash Gurung, ICFAI UNIVERSITY.Sikkim BUILDING of 3 R'sCLUSTER PARALLEL COMPUTER.

MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 8 Introduction to Printers in a Windows Server 2008 Network.

High Performance Computing (HPC) at Center for Information Communication and Technology in UTM.

Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.

Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications A. Caulfield, L. Grupp, S. Swanson, UCSD, ASPLOS’09.

JGI/NERSC New Hardware Training Kirsten Fagnan, Seung-Jin Sul January 10, 2013.

Rocks cluster : a cluster oriented linux distribution or how to install a computer cluster in a day.

ww w.p ost ers essi on. co m E quipped with latest high end computing systems for providing wide range of services.

Task Farming on HPCx David Henty HPCx Applications Support

About the Presentations The presentations cover the objectives found in the opening of each chapter. All chapter objectives are listed in the beginning.

Research Computing with Newton Gerald Ragghianti Newton HPC workshop Sept. 3, 2010.

A Makeshift HPC (Test) Cluster Hardware Selection Our goal was low-cost cycles in a configuration that can be easily expanded using heterogeneous processors.

WORK ON CLUSTER HYBRILIT E. Aleksandrov 1, D. Belyakov 1, M. Matveev 1, M. Vala 1,2 1 Joint Institute for nuclear research, LIT, Russia 2 Institute for.

1 Guide to Novell NetWare 6.0 Network Administration Chapter 13.

Introduction to HPC resources for BCB 660 Nirav Merchant

ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.

MSc. Miriel Martín Mesa, DIC, UCLV. The idea Installing a High Performance Cluster in the UCLV, using professional servers with open source operating.

Introduction to the HPCC Dirk Colbry Research Specialist Institute for Cyber Enabled Research.

Lab System Environment

CMAQ Runtime Performance as Affected by Number of Processors and NFS Writes Patricia A. Bresnahan, a * Ahmed Ibrahim b, Jesse Bash a and David Miller a.

Batch Scheduling at LeSC with Sun Grid Engine David McBride Systems Programmer London e-Science Centre Department of Computing, Imperial College.

HPC for Statistics Grad Students. A Cluster Not just a bunch of computers Linked CPUs managed by queuing software – Cluster – Node – CPU.

ITEP computing center and plans for supercomputing Plans for Tier 1 for FAIR (GSI) in ITEP  8000 cores in 3 years, in this year  Distributed.

Software Overview Environment, libraries, debuggers, programming tools and applications Jonathan Carter NUG Training 3 Oct 2005.

The CRI compute cluster CRUK Cambridge Research Institute.

CLASS Information Management Presented at NOAATECH Conference 2006 Presented by Pat Schafer (CLASS-WV Development Lead)

Cluster Software Overview

The LBNL Perceus Cluster Infrastructure Next Generation Cluster Provisioning and Management October 10, 2007 Internet2 Fall Conference Gary Jung, SCS Project.

DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

ARCHER Advanced Research Computing High End Resource

ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.

Final Implementation of a High Performance Computing Cluster at Florida Tech P. FORD, X. FAVE, K. GNANVO, R. HOCH, M. HOHLMANN, D. MITRA Physics and Space.

Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.

Operational and Application Experiences with the Infiniband Environment Sharon Brunett Caltech May 1, 2007.

Getting Started: XSEDE Comet Shahzeb Siddiqui - Software Systems Engineer Office: 222A Computer Building Institute of CyberScience May.

LIOProf: Exposing Lustre File System Behavior for I/O Middleware

INTRODUCTION TO GRID & CLOUD COMPUTING U. Jhashuva 1 Asst. Professor Dept. of CSE.

HELMHOLTZ INSTITUT MAINZ Dalibor Djukanovic Helmholtz-Institut Mainz PANDA Collaboration Meeting GSI, Darmstadt.

Multicore Applications in Physics and Biochemical Research Hristo Iliev Faculty of Physics Sofia University “St. Kliment Ohridski” 3 rd Balkan Conference.

Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb

CFI 2004 UW A quick overview with lots of time for Q&A and exploration.

An Brief Introduction Charlie Taylor Associate Director, Research Computing UF Research Computing.

Advanced Computing Facility Introduction

Compute and Storage For the Farm at Jlab

Welcome to Indiana University Clusters

HPC Roadshow Overview of HPC systems and software available within the LinkSCEEM project.

HPC usage and software packages

Cluster / Grid Status Update

Jeremy Maris Research Computing IT Services University of Sussex

Hodor HPC Cluster LON MNG HPN Head Node Comp Node Comp Node Comp Node

Architecture & System Overview

BIMSB Bioinformatics Coordination

Small site approaches - Sussex

Integration of Singularity With Makeflow

Migration Strategies – Business Desktop Deployment (BDD) Overview

Welcome to our Nuclear Physics Computing System

USF Health Informatics Institute (HII)

Welcome to our Nuclear Physics Computing System

Introduction to High Performance Computing Using Sapelo2 at GACRC

The Neuronix HPC Cluster:

Presentation transcript:

Research Computing The Apollo HPC Cluster Jeremy Maris Research Computing IT Services University of Sussex

High Performance Computing ? Generically, computing systems comprised of multiple processors linked together in a single system. Used to solve problems beyond the scope of the desktop. High performance computing Maximising number of cycles per second, usually parallel High throughput computing Maximising number of cycles per year, usually serial Facilitating the storage, access and processing of data Coping with the massive growth in data High Performance – weather forecasting, simulations, molecular dynamics High Throughput – CERN, sequence assembly etc Lots of data – CERN, genomics, life Sciences generally

High Performance Computing tasks must run quickly single problem split across many processors Weather forecasting Markov models Imaging processing 3D image reconstruction 4D visualisation Sequence assembly Whole genome analysis Assembly - aligning and merging fragments of a much longer DNA sequence in order to reconstruct the original sequence. 1TB of output per day from shotgun sequencers Task parallel Markov – finding patterns over time in complex systems (in space as well as time)

High throughput computing A lot of work done over a long time frame One program run many times, eg searching a large data set Loosely coupled. ATLAS analysis Genomics (sequence alignment, BLAST etc) Virtual Screening (eg in drug discovery) Parameter exploration (simulations) Statistical analysis (eg bootstrap analysis,) 3.4 thousand million base pairs in Human Genome, 20 – 25k genes. Data Parallel

Apollo Cluster - Aims Computing provision beyond capability of desktop Priority access determined by Advisory Group Shared infrastructure and support from IT Services Extension of facility by departments Storage (adding to Lustre, access to SAN) CPU Software Licenses More compute nodes will be purchased by IT Services as budgets allow )

Apollo Cluster - Hardware Total 264 cores 14 x 12 core 2.67GHz Intel nodes - 168 cores 2 x 48 core nodes 2.2GHz AMD nodes – 96 cores (17 x 8 core blades , GigE (informatics) - 136 cores) 48 GB RAM for Intel nodes 256 GB RAM for AMD nodes 20 TB Home NFS file system (backed up) 80 TB Lustre file system (scratch, not backed up) QDR (40Gbs) Infiniband interconnect )

Apollo Cluster - Hardware )

Apollo Cluster - Filesystems 1 Home - 20 TB, RAID6 set of 12 x 2TB disks Exported via NFS Backed up, keep your valuable data here Easily overloaded if many cores read/write at same time Lustre parallel file system - /mnt/lustre/scratch Redundant metadata server Three object servers, each with 3 x 10 TB RAID6 OST Data striping configured by file or directory Can stripe across all 108 disks. Aggregate data rate ~ 3.8GB/s NOT backed up, for temporary storage Local 400GB scratch disk per node )

Apollo Cluster - Filesystems 2 The graph above demonstrates the performance of NFS, Gluster and Lustre solutions when compared on equivalent hardware. These results were obtained with the IOZONE benchmark using two SunFire X4500 12TB storage servers and five SunFire X2270 servers each running 16 simultaneous threads, all connected via trunked gigabit Ethernet. Gluster a parallel NAS, not as good as Lustre and less coverage/support. Not open source © Alces Software Limited

Apollo Cluster - Lustre Performance ) IOIOZONE Benchmark Summary (c ) 2010 Alces Software Ltd

Apollo Cluster - Lustre 1 ) Figure (c)2009 Cray Inc

Apollo Cluster - Lustre 2 Storing a single file across multiple OSTs (striping) offers two benefits: an increase in the bandwidth available when accessing the file an increase in the available disk space for storing the file. Striping has disadvantages: increased overhead due to network operations and server contention increased risk of file damage due to hardware malfunction. Given the tradeoffs involved, the Lustre file system allows users to specify the striping policy for each file or directory of files using the lfs utility. )

Apollo Cluster - Lustre 3 Further reading … Lustre FAQ http://wiki.lustre.org/index.php/Lustre_FAQ Lustre 1.8 operations manual http://wiki.lustre.org/manual/LustreManual18_HTML/index.html I/O Tips - Lustre Striping and Parallel I/O http://www.nics.tennessee.edu/io-tips )

Apollo Cluster - Software Module system used for defining paths and libraries Need to load software required Module avail, module add XX, module unload XX Access optimised math libraries, MPI stacks, compilers etc Latest version of packages, eg python, gcc easily installed Intel parallel studio suite C C++ Fortran, MKL, performance analysis etc gcc and Open64 compilers Will compile/install software for users if asked )

Apollo Cluster - Software Compilers/tools Libraries Programs ant acml adf gcc suite atlas aimpro Intel (c, c++, fortran) blas, gotoblas alberta git cloog gadget jdk 1.6_024 fftw2, fftw3 gap mercurial gmp Gaussian 09 open64 gsl hdf5 python 2.6.6 hpl idl sbcl lapack, scalapack matlab MVAPICH, MVAPICH2 mricron mpfr netcdf nag paraview openMPI stata ppl WRF )

Apollo Cluster - Queues 1 Sun Grid Engine used for batch system parallel.q for MPI and OpenMP jobs Intel nodes Slot limit of 48 cores per user at present serial.q for serial and OpenMP jobs AMD nodes Slot limit of 36 cores per user inf.q for serial or openMP jobs - Informatics use only No other limits – need user input re configuration )

Apollo Cluster - serial job script Queue is by default the serial.q #!/bin/sh #$ -N sleep #$ -S /bin/sh #$ -cwd #$ -M user@sussex.ac.uk #$ -m bea echo Starting at: `date` sleep 60 echo Now it is: `date` )

Apollo Cluster - parallel job script For parallel jobs you must specify the pe environment assigned to the script. parallel environments: openmpi or openmp. mvapich2 often less efficient. #!/bin/sh #$ -N JobName #$ -M user@sussex.ac.uk #$ -m bea #$ -cwd #$ -pe openmpi NUMBER_OF_CPUS # eg 12-36 #$ -q parallel.q #$ -S /bin/bash # source modules environment: . /etc/profile.d/modules.sh module add gcc/4.3.4 qlogic/openmpi/gcc mpirun -np $NSLOTS -machinefile $TMPDIR/machines /path/to/exec )

Apollo Cluster - Monitoring Ganglia statistics for apollo and feynman (the EPP cluster) are http://feynman.hpc.susx.ac.uk )

Apollo Cluster - Accounting Simply, qacct –o Accounting monitored to show fair usage, etc Statistics may be posted on the web. )

Apollo Cluster - Documentation To get an account on the cluster, email researchsupport@its.sussex.ac.uk Userguides and example scripts are available on the cluster in /cm/shared/docs/USERGUIDE Other documentation in /cm/shared/docs path )

Apollo Cluster - Adding nodes C6100 chassis + four servers each 2.67 GHz, 12 core, 48GB RAM, IB card, licences R815 48 core 2.2GHz AMD Departments guaranteed 90% exclusive of their nodes. 10% sharing with others, plus back fill of idle time. Contact researchsupport@its.sussex.ac.uk for latest pricing )

Apollo Cluster - Adding storage Configuration of an additional Object Server Server R510, 8 core 24GB RAM + 12 x 2 TB disks JBOD MD1200 + 12 x 2TB disks + H800 RAID card The minimum increment is the server which provides about 19TB after formatting and RAID6 Two JBODS give another 38TB. Total storage ~57 TB in 6 OST Contact researchsupport@its.sussex.ac.uk for pricing )

Questions?