An Analysis of Node Sharing on HPC Clusters using XDMoD/TACC_Stats Joseph P White, Ph.D Scientific Programmer - Center for Computational Research University.

An Analysis of Node Sharing on HPC Clusters using XDMoD/TACC_Stats Joseph P White, Ph.D Scientific Programmer - Center for Computational Research University at Buffalo, SUNY XSEDE14 JULY 13– 18, 2014

TECHNOLOGY AUDIT SERVICE Outline Motivation Overview of tools (XDMOD, tacc_stats) Background Results Conclusions Discussion

TECHNOLOGY AUDIT SERVICE CoAuthors Robert L. DeLeon (UB) Thomas R. Furlani (UB) Steven M. Gallo (UB) Matthew D Jones (UB) Amin Ghadersohi (UB) Cynthia D. Cornelius (UB) Abani K. Patra (UB) James C. Browne (UTexas) William L. Barth (TACC) John Hammond (TACC)

TECHNOLOGY AUDIT SERVICE Motivation Node sharing benefits: – increases throughput by up to 26% – increases energy efficiency by up to 22% (Breslow et al.) Node sharing disadvantages: – resource contention Number of cores per node increasing Ulterior motive: – Prove toolset A. D. Breslow, L. Porter, A. Tiwari, M. Laurenzano, L. Carrington, D. M. Tullsen, and A. E. Snavely. The case for colocation of hpc workloads. Concurrency and Computation: Practice and Experience, 2013 http://dx.doi.org/10.1002/cpe.3187

TECHNOLOGY AUDIT SERVICE Tools XDMoD – NSF funded open source tool that provides a wide range of usage and performance metrics on XSEDE systems – Web-based interface – Powerful charting features tacc_stats – low-overhead collection of system-wide performance data – Runs on every node on a resource collects data at job start, end and periodically during job CPU usage Hardware performance counters Memory usage I/O usage

TECHNOLOGY AUDIT SERVICE Data flow

TECHNOLOGY AUDIT SERVICE XDMoD Data Sources

TECHNOLOGY AUDIT SERVICE Background CCR's HPC resource "Rush" – 8000+ cores – Heterogeneous cluster 8, 12, 16 or 32 cores per node – InfiniBand – Panasas parallel filesystem – SLURM resource manager node sharing enabled by default cgroup plugin to isolate jobs Academic computing center: higher % of smaller jobs than large XSEDE resources All data from Jan - Feb 2014 (~370,000 jobs)

TECHNOLOGY AUDIT SERVICE Number of jobs by job size

TECHNOLOGY AUDIT SERVICE Results Exclusive jobs: where no other jobs ran concurrently on the allocated node(s) (left hand side of plots) Shared jobs: where at least one other job was running on the allocated node(s) (right hand side) – Process memory usage – Total OS memory usage – LLC read miss rates – Job exit status – Parallel filesystem bandwidth – InfiniBand interconnect bandwidth

TECHNOLOGY AUDIT SERVICE Memory usage per core (MemUsed - FilePages - Slab) from /sys/devices/system/node/node0/meminfo Memory usage per core GB Exclusive jobs Memory usage per core GB Shared jobs

TECHNOLOGY AUDIT SERVICE Total memory usage per core (4GB/core nodes) Total memory usage per core GB Exclusive jobs Total memory usage per core GB Shared jobs

TECHNOLOGY AUDIT SERVICE Last level cache (LLC) read miss rate per socket UNC_LLC_MISS:READ on Intel Westmere uncore Gives upper bound estimate of DRAM bandwidth LLC read miss rate 10 6 /s Exclusive jobs LLC read miss rate 10 6 /s Shared jobs

TECHNOLOGY AUDIT SERVICE Job exit status reported by SLURM

TECHNOLOGY AUDIT SERVICE Panasas parallel filesystem write rate per node Write rate per node B/s Exclusive jobs Write rate per node B/s Shared jobs

TECHNOLOGY AUDIT SERVICE InfiniBand write rate per node Write rate Log 10 (B/s) Exclusive jobs Write rate Log 10 (B/s) Shared jobs Peaks truncated: ~45,000 for Exclusive jobs ~80,000 for shared jobs

TECHNOLOGY AUDIT SERVICE Conclusions Little difference on average between the shared and exclusive jobs on Rush Majority of jobs have resource usage much less than max available Have created data collection/processing software that facilitates easy evaluation of system usage

TECHNOLOGY AUDIT SERVICE Discussion Limitations of current work – Unable to determine impact (if any) on job wall time – Comparing overall average values for jobs – Shared node job statistics are convolved – Exit code not reliable way to determine failure

TECHNOLOGY AUDIT SERVICE Future work Use Application Kernels to get detailed analysis of interference Many more metrics now available: – FLOPS – CPU clock cycles per instruction (CPI) – CPU clock cycles per L1D cache load (CPLD) Add support for per job metrics on shared nodes. Study classes of applications

TECHNOLOGY AUDIT SERVICE Questions BOF: XDMoD: A Tool for Comprehensive Resource Management of HPC Systems – 6:00pm - 7:00pm tomorrow. Room A602 XDMoD – https://xdmod.ccr.buffalo.edu/ https://xdmod.ccr.buffalo.edu/ tacc_stats – http://github.com/TACCProjects/tacc_stats http://github.com/TACCProjects/tacc_stats Contact info – xdmod-help@ccr.buffalo.edu

TECHNOLOGY AUDIT SERVICE Acknowledgments This work is supported by the National Science Foundation under grant number OCI 1203560 and grant number OCI 1025159 for the technology audit service (TAS) for XSEDE

An Analysis of Node Sharing on HPC Clusters using XDMoD/TACC_Stats Joseph P White, Ph.D Scientific Programmer - Center for Computational Research University.

Similar presentations

Presentation on theme: "An Analysis of Node Sharing on HPC Clusters using XDMoD/TACC_Stats Joseph P White, Ph.D Scientific Programmer - Center for Computational Research University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

An Analysis of Node Sharing on HPC Clusters using XDMoD/TACC_Stats Joseph P White, Ph.D Scientific Programmer - Center for Computational Research University.

Similar presentations

Presentation on theme: "An Analysis of Node Sharing on HPC Clusters using XDMoD/TACC_Stats Joseph P White, Ph.D Scientific Programmer - Center for Computational Research University."— Presentation transcript:

Similar presentations

About project

Feedback