Use of GPUs in ALICE (and elsewhere) Thorsten Kollegger TDOC-PG | CERN | 17.07.2013.

Use of GPUs in ALICE (and elsewhere) Thorsten Kollegger TDOC-PG | CERN | 17.07.2013

GPUs for General Purpose Computing In the last 5+ years, increased usage of GPUs (or more general accelerator cards) in High Performance Computing Systems Use of GPUs in ALICE and elsewhere | Thorsten Kollegger | ECFA HL-LHC TDOC-PG | 17.07.20132 Top 500 list - 2013 NVIDIA AMD Intel

GPUs for General Purpose Computing Use of GPUs in ALICE and elsewhere | Thorsten Kollegger | ECFA HL-LHC TDOC-PG | 17.07.20133 Driven by (theoretical) peak performance GPU: O(1) TFLOP/s (NVIDIA TESLA K20: 3.2 TFLOP/s) CPU: O(0.1) TFLOP/s (Intel Xeon E5-2690 : 243 GFLOP/s) Can this theoretical peak performance be used efficiently for the typical HEP workload?

GPGPU Processing Model Use of GPUs in ALICE and elsewhere | Thorsten Kollegger | ECFA HL-LHC TDOC-PG | 17.07.20134 Pre-Conditions for effective GPU speed-up of applications Computationally intensive — Time needed for computing much larger then time need for data transfer to GPU Massively parallel — Hundreds of independent computing tasks Few complex CPU cores vs many simple GPU cores Programming Languages: CUDA, OpenCL OpenACC, OpenMP, OpenHMPP, TBB, MPI

What to expect? Typical success stories of GPGPU usage report >x100 speedup However: The expected speedup is strongly depending on workloads. Comparing optimized multi-core CPU versions with optimized GPU versions for most workloads speedup’s of ~5 are measured Use of GPUs in ALICE and elsewhere | Thorsten Kollegger | ECFA HL-LHC TDOC-PG | 17.07.20135

GPGPUs in HEP Lots of R&D activities in the experiments ongoing Mostly focused on Trigger or High-Level-Trigger systems, HW decisions easier than in heterogeneous GRID systems R&D projects I know of, for sure incomplete: ALICE, ATLAS, CMS, LHCb @ LHC, CERN NA62 @ SPS, CERN CBM, PANDA @ FAIR, GSI, Germany STAR @ RHIC, BNL, USA GEANT 4 … ALICE HLT is using GPUs in production since 2010/2011 Use of GPUs in ALICE and elsewhere | Thorsten Kollegger | ECFA HL-LHC TDOC-PG | 17.07.20136

ALICE HLT Use of GPUs in ALICE and elsewhere | Thorsten Kollegger | ECFA HL-LHC TDOC-PG | 17.07.20137 Input data rate: ~1 kHz, 20 GByte/s Event size ranging from <1 MByte (p+p) to 80 MByte (central Pb+Pb) Full online reconstruction including tracking of TPC+IST (intermediate) results replace raw data to limit storage space Compute nodes (CN/CNGPU) Full event reconstruction 32+32 nodes with NVIDIA GTX 480/580 GTX580 newly installed in 2011

ALICE HLT TPC Tracker TPC tracking algorithm based on Cellular Automaton approach Optimized for multi-core CPUs to fulfill latency requirements 2009 ported to CUDA for use on NVIDIA GTX285 consumer cards, changed to use single precision 2010 ported to GTX480 2011 added GTX580, fully commisioned Use of GPUs in ALICE and elsewhere | Thorsten Kollegger | ECFA HL-LHC TDOC-PG | 17.07.20138

ALICE HLT TPC Tracker Speedup Use of GPUs in ALICE and elsewhere | Thorsten Kollegger | ECFA HL-LHC TDOC-PG | 17.07.20139 4-fold Speedup compared to optimized CPU version Note: frees CPUs on CN for other operations (tagging/trigger)

ALICE HLT GPU Experience Experience quite promising, will continue/expand in Run 2 Allowed to reduce system size by factor 3 Stable operation even with consumer hardware Comes with some cost Initial porting to CUDA, change to SP: 1.5 PhD students/1 year Every new GPU generation requires re-tuning (even same chip) Need to support two versions (CPU for simulation, GPU) Full loading of GPU requires quite some effort: currently at 67% Use of GPUs in ALICE and elsewhere | Thorsten Kollegger | ECFA HL-LHC TDOC-PG | 17.07.201310

Use of GPUs in ALICE and elsewhere | Thorsten Kollegger | ECFA HL-LHC TDOC-PG | 17.07.201311 GPUs in the NA62 TDAQ system RO board L0TP L1 PC GPU L1TP L2 PC GPU 1 MHz1 MHz100 kHz RO board L0 GPU L0TP 10 MHz 1 MHz1 MHz Max 1 ms latency The use of the GPU at the software levels (L1/2) is “straightforward”: put the video card in the PC. No particular changes to the hardware are needed The main advantages is to exploit the power of GPUs to reduce the number of PCs in the L1 farms The use of GPU at L0 is more challenging: Fixed and small latency (dimension of the L0 buffers) Deterministic behavior (synchronous trigger) Very fast algorithms (high rate) 11 Slide from Gianluca Lamann (CERN)

Some recent trends Direct transfer of data from e.g. network to GPU w/o involving CPU (AMD: DirectGMA, NVIDIA: GPU Direct 2) APUs: Integrate GPU with CPUs on a chip NVIDIA Tegra: ARM+GPU AMD Fusion: x86+GPU Use of GPUs in ALICE and elsewhere | Thorsten Kollegger | ECFA HL-LHC TDOC-PG | 17.07.201312 CPU Memory CPU PCIe bus SDI Input/Output card Graphics card FP GA SDI out SDI in Peer-to-peer transfers (DirectGMA) GP U

Where we are… GPGPUs can provide a significant benefit today mainly for tightly-controlled systems, e.g. Trigger & HLT - reduced infrastructure cost development cost main issue is programming complexity & maintenance - will there be a common programming language/library? avoid vendor lock-in… - do we need the ultimate performance? Highly-parallel programming model will be also relevant for effective use of future many-core CPUs GPUs evolving more and more into independent compute units Use of GPUs in ALICE and elsewhere | Thorsten Kollegger | ECFA HL-LHC TDOC-PG | 17.07.201313

Use of GPUs in ALICE (and elsewhere) Thorsten Kollegger TDOC-PG | CERN | 17.07.2013.

Similar presentations

Presentation on theme: "Use of GPUs in ALICE (and elsewhere) Thorsten Kollegger TDOC-PG | CERN | 17.07.2013."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Use of GPUs in ALICE (and elsewhere) Thorsten Kollegger TDOC-PG | CERN | 17.07.2013.

Similar presentations

Presentation on theme: "Use of GPUs in ALICE (and elsewhere) Thorsten Kollegger TDOC-PG | CERN | 17.07.2013."— Presentation transcript:

Similar presentations

About project

Feedback