Presentation is loading. Please wait.

Presentation is loading. Please wait.

Research Computing The Apollo HPC Cluster Jeremy Maris Research Computing IT Services University of Sussex.

Similar presentations

Presentation on theme: "Research Computing The Apollo HPC Cluster Jeremy Maris Research Computing IT Services University of Sussex."— Presentation transcript:

1 Research Computing The Apollo HPC Cluster Jeremy Maris Research Computing IT Services University of Sussex

2 High Performance Computing ? Generically, computing systems comprised of multiple processors linked together in a single system. Used to solve problems beyond the scope of the desktop. High performance computing –Maximising number of cycles per second, usually parallel High throughput computing –Maximising number of cycles per year, usually serial Facilitating the storage, access and processing of data –Coping with the massive growth in data

3 High Performance Computing High performance computing –tasks must run quickly –single problem split across many processors Weather forecasting Markov models Imaging processing –3D image reconstruction –4D visualisation Sequence assembly Whole genome analysis

4 High throughput computing A lot of work done over a long time frame –One program run many times, eg searching a large data set –Loosely coupled. ATLAS analysis Genomics (sequence alignment, BLAST etc) Virtual Screening (eg in drug discovery) Parameter exploration (simulations) Statistical analysis (eg bootstrap analysis,)

5 Apollo Cluster - Aims Computing provision beyond capability of desktop Priority access determined by Advisory Group Shared infrastructure and support from IT Services Extension of facility by departments –Storage (adding to Lustre, access to SAN) –CPU –Software Licenses More compute nodes will be purchased by IT Services as budgets allow

6 Apollo Cluster - Hardware Total 264 cores –14 x 12 core 2.67GHz Intel nodes cores – 2 x 48 core nodes 2.2GHz AMD nodes – 96 cores (17 x 8 core blades, GigE (informatics) cores) 48 GB RAM for Intel nodes 256 GB RAM for AMD nodes 20 TB Home NFS file system (backed up) 80 TB Lustre file system (scratch, not backed up) QDR (40Gbs) Infiniband interconnect

7 Apollo Cluster - Hardware

8 Apollo Cluster - Filesystems 1 Home - 20 TB, RAID6 set of 12 x 2TB disks –Exported via NFS –Backed up, keep your valuable data here –Easily overloaded if many cores read/write at same time Lustre parallel file system - /mnt/lustre/scratch –Redundant metadata server –Three object servers, each with 3 x 10 TB RAID6 OST –Data striping configured by file or directory –Can stripe across all 108 disks. Aggregate data rate ~ 3.8GB/s –NOT backed up, for temporary storage Local 400GB scratch disk per node

9 Apollo Cluster - Filesystems 2 © Alces Software Limited

10 Apollo Cluster - Lustre Performance (c ) 2010 Alces Software Ltd IOIOZONE Benchmark Summary

11 Apollo Cluster - Lustre 1 Figure (c)2009 Cray Inc

12 Apollo Cluster - Lustre 2 Storing a single file across multiple OSTs (striping) offers two benefits: an increase in the bandwidth available when accessing the file an increase in the available disk space for storing the file. Striping has disadvantages: increased overhead due to network operations and server contention increased risk of file damage due to hardware malfunction. Given the tradeoffs involved, the Lustre file system allows users to specify the striping policy for each file or directory of files using the lfs utility.

13 Apollo Cluster - Lustre 3 Further reading … –Lustre FAQ –Lustre 1.8 operations manual –I/O Tips - Lustre Striping and Parallel I/O

14 Apollo Cluster - Software Module system used for defining paths and libraries –Need to load software required –Module avail, module add XX, module unload XX –Access optimised math libraries, MPI stacks, compilers etc –Latest version of packages, eg python, gcc easily installed Intel parallel studio suite –C C++ Fortran, MKL, performance analysis etc gcc and Open64 compilers Will compile/install software for users if asked

15 Apollo Cluster - Software Compilers/toolsLibrariesPrograms antacmladf gcc suiteatlasaimpro Intel (c, c++, fortran)blas, gotoblasalberta gitclooggadget jdk 1.6_024fftw2, fftw3gap mercurialgmpGaussian 09 open64gslhdf5 python 2.6.6hplidl sbcllapack, scalapackmatlab MVAPICH, MVAPICH2 mricron mpfrnetcdf nagparaview openMPIstata ppl WRF

16 Apollo Cluster - Queues 1 Sun Grid Engine used for batch system parallel.q for MPI and OpenMP jobs –Intel nodes –Slot limit of 48 cores per user at present serial.q for serial and OpenMP jobs –AMD nodes –Slot limit of 36 cores per user inf.q for serial or openMP jobs - Informatics use only No other limits – need user input re configuration

17 Apollo Cluster - serial job script Queue is by default the serial.q #!/bin/sh #$ -N sleep #$ -S /bin/sh #$ -cwd #$ -M #$ -m bea echo Starting at: `date` sleep 60 echo Now it is: `date`

18 Apollo Cluster - parallel job script For parallel jobs you must specify the pe environment assigned to the script. parallel environments: openmpi or openmp. mvapich2 often less efficient. #!/bin/sh #$ -N JobName #$ -M #$ -m bea #$ -cwd #$ -pe openmpi NUMBER_OF_CPUS # eg #$ -q parallel.q #$ -S /bin/bash # source modules environment:. /etc/profile.d/ module add gcc/4.3.4 qlogic/openmpi/gcc mpirun -np $NSLOTS -machinefile $TMPDIR/machines /path/to/exec

19 Apollo Cluster - Monitoring Ganglia statistics for apollo and feynman (the EPP cluster) are

20 Apollo Cluster - Accounting Simply, qacct –o Accounting monitored to show fair usage, etc Statistics may be posted on the web.

21 Apollo Cluster - Documentation To get an account on the cluster, Userguides and example scripts are available on the cluster in /cm/shared/docs/USERGUIDE Other documentation in /cm/shared/docs path

22 Apollo Cluster - Adding nodes C6100 chassis + four servers each 2.67 GHz, 12 core, 48GB RAM, IB card, licences R core 2.2GHz AMD Departments guaranteed 90% exclusive of their nodes. 10% sharing with others, plus back fill of idle time. Contact for latest pricing

23 Apollo Cluster - Adding storage Configuration of an additional Object Server – Server R510, 8 core 24GB RAM + 12 x 2 TB disks – JBOD MD x 2TB disks + H800 RAID card –The minimum increment is the server which provides about 19TB after formatting and RAID6 –Two JBODS give another 38TB. –Total storage ~57 TB in 6 OST Contact for pricing

24 Questions?

Download ppt "Research Computing The Apollo HPC Cluster Jeremy Maris Research Computing IT Services University of Sussex."

Similar presentations

Ads by Google