Presentation is loading. Please wait.

Presentation is loading. Please wait.

Trends and Perspectives for HPC infrastructures Carlo Cavazzoni, CINECA.

Similar presentations


Presentation on theme: "Trends and Perspectives for HPC infrastructures Carlo Cavazzoni, CINECA."— Presentation transcript:

1 Trends and Perspectives for HPC infrastructures Carlo Cavazzoni, CINECA

2 outline -HPC resource in EUROPA (PRACE) -Today HPC architectures -Technology trends -Cineca roadmaps (toward 50PFlops) -EuroExa project

3 The PRACE RI provides access to distributed persistent pan-European world class HPC computing and data management resources and services. Expertise in efficient use of the resources is available through participating centers throughout Europe. Available resources are announced for each Call for Proposals.. Peer reviewed open access PRACE Projects (Tier-0) PRACE Preparatory (Tier-0) DECI Projects (Tier-1) European Local Tier 0 Tier 1 Tier 2 National

4 TIER-0 System, PRACE regular calls CURIE (GENCI, Fr), BULL Cluster, Intel Xeon, Nvidia cards, Infiniband network FERMI (CINECA, It) & JUQUEEN (Juelich, D), IBM BGQ, Power processors, custom 5D torus net. HERMIT (HLRS, D), Cray XE6, AMD procs, custom 3D torus net. 1PFLops MARENOSTRUM (BSC, S), IBM DataPlex, Intel Xeon node, Infiniband net. SuperMUC (LRZ, D), IBM DataPlex, Intel Xeon Node, Infiniband net..

5 DECI siteMachine nameSystem typechippeak perfor- mance (Tflops) GPU cards Bulgaria (NCSA)EA"ECNIS"IBM BG/PPowerPC Czech Repulic (VSB-TUO)AnselmBull BullxIntel Sandy Bridge-EP6623 nVIDIA Tesla 4 Intel Xeon Phi P5110 Finland (CSC)SisuCray XC30Intel Sandy Bridge244.9 France (CINES)JadeSGI ICE EX8200Intel Quad-Core E5472/X France (IDRIS)BabelIBM BG/PPowerPC Germany (Jülich)JuRoPAIntel clusterIntel Xeon X Germany (RZG)GeniusIMB BG/PPowerPC Germany (RZG)iDataPlexIntel Sandy Bridge200 Ireland (ICHEC)StokesSgi ICE 8200EXIntel Xeon E5650 Italy (CINECA)PLXiDataPlexIntel Westmere nVIDIA Tesla M2070/ M2070Q Norway (SIGMA)AbelMegWare clusterIntel Sandy Bridge260 Poland (WCSS)SupernovaClusterIntel Westmere-EP51.58 Poland (PSNC)chimeraSGI UV1000Intel Xeon E Poland (PSNC)canecluster AMD&GPUAMD Opteron™ NVIDIA TeslaM2050 Poland (ICM)boreaszIBM Power 775 (Power7)IBM Power774.5 Poland (Cyfronet)Zeus-gpgpuLinux ClusterIntel Xeon X5670/E M2050/160 M2090 Spain (BSC)MinoTauroBull Cuda ClusterIntel Xeon E nVIDIA Tesla M2090 Sweden (PDC)LindgrenCray XE6AMD Opteron305 Switzerland (CSCS)Monte RosaCray XE6AMD Opteron402 The Netherlands (SARA)HuygensIBM pSeries 575Power 665 Turkey (UYBHM)KaradenizHP ClusterIntel Xeon UK (EPCC)HECToRCray XE6AMD Opteron UK (ICE-CSE)ICE AdvanceIBM BG/QPowerPC A21250 TIER-1 Systems, DECI calls

6 HPC Architectures two model Hybrid:  Server class processors:  Server class nodes  Special purpose nodes  Accelerator devices:  Nvidia  Intel  AMD  FPGA Homogeneus:  Server class node:  Standar processors  Special porpouse nodes  Special purpose processors

7 Networks Standard/switched:  Infiniband Special purpose/Topology:  BGQ  CRAY  TOFU (Fujitsu)  TH Express-2 (Thiane-2)

8 Programming Models fundamental paradigm:  Message passing  Multi-threads  Consolidated standard: MPI & OpenMP  New task based programming model Special purpose for accelerators:  CUDA  Intel offload directives  OpenACC, OpenCL, Ecc…  NO consolidated standard Scripting:  python

9 Roadmap to Exascale (architectural trends)

10 Dennard scaling law (downscaling) L’ = L / 2 V’ = V / 2 F’ = F * 2 D’ = 1 / L 2 = 4D P’ = P do not hold anymore! The power crisis! L’ = L / 2 V’ = ~V F’ = ~F * 2 D’ = 1 / L 2 = 4 * D P’ = 4 * P Increase the number of cores to maintain the architectures evolution on the Moore’s law Programming crisis! The core frequency and performance do not grow following the Moore’s law any longer new VLSI gen. old VLSI gen.

11 The cost per chip “is going down more than the capital intensity is going up,” Smith said, suggesting Intel’s profit margins should not suffer because of heavy capital spending. “This is the economic beauty of Moore’s Law.” And Intel has a good handle on the next production shift, shrinking circuitry to 10 nanometers. Holt said the company has test chips running on that technology. “We are projecting similar kinds of improvements in cost out to 10 nanometers,” he said. So, despite the challenges, Holt could not be induced to say there’s any looming end to Moore’s Law, the invention race that has been a key driver of electronics innovation since first defined by Intel’s co-founder in the mid-1960s. Moore’s Law Economic and market law From WSJ Stacy Smith, Intel’s chief financial officer, later gave some more detail on the economic benefits of staying on the Moore’s Law race. It is all about the number of chips per Si wafer!

12 But! Si lattice 0.54 nm There will be still 4~6 cycles (or technology generations) left until we reach 11 ~ 5.5 nm technologies, at which we will reach downscaling limit, in some year between (H. Iwai, IWJT2008). 300 atoms! 14nm VLSI

13 What about Applications? In a massively parallel context, an upper limit for the scalability of parallel applications is determined by the fraction of the overall execution time spent in non-scalable operations (Amdahl's law). maximum speedup tends to 1 / ( 1 − P ) P= parallel fraction core P = serial fraction=

14 Architectural trends Peak PerformanceMoore law FPU PerformanceDennard law Number of FPUs Moore + Dennard App. Parallelism Amdahl's law

15 HPC Architectures two model Hybrid, but… Homogeneus, but… What 100PFlops system we will see … my guess IBM (hybrid) Power8+Nvidia GPU Cray (homo/hybrid) with Intel only! Intel (hybrid) Xeon + MIC Arm (homo) only arm chip, but… Nvidia/Arm (hybrid) arm+Nvidia Fujitsu (homo) sparc high density low power China (homo/hybrid) with Intel only Room for AMD console chips

16 Chip Architecture Intel ARM NVIDIA Power AMD Strongly market driven Mobile, Tv set, Screens Video/Image processing  New arch to compete with ARM  Less Xeon, but PHI  Main focus on low power mobile chip  Qualcomm, Texas inst., Nvidia, ST, ecc  new HPC market, server maket  GPU alone will not last long  ARM+GPU, Power+GPU  Embedded market  Power+GPU, only chance for HPC  Console market  Still some chance for HPC

17 CINECA Roadmaps

18 Roadmap 50PFlops

19 Requisiti di alto livello del sistema Potenza elettrica assorbita: 400KW Dimensione fisica del sistema: 5 racks Potenza di picco del sistema (CPU+GPU): nell'ordine di 1PFlops Potenza di picco del sistema (solo CPU): nell'ordine di 300TFlops Tier 1 CINECA Procurement Q2014

20 Requisiti di alto livello del sistema Architettura CPU: Intel Xeon Ivy Bridge Numero di core per CPU: >3GHz, oppure 2.4GHz La scelta della frequenza ed il numero di core dipende dal TDP del socket, dalla densità del sistema e dalla capacità di raffreddamento Numero di server: , ( Peak perf = 600 * 2socket * 12core * 3GHz * 8Flop/clk = 345TFlops ) Il numero di server del sistema potrà dipendere dal costo o dalla geometria della configurazione in termini di numero di nodi solo CPU e numero di nodi CPU+GPU Architettura GPU: Nvidia K40 Numero di GPU: >500 ( Peak perf = 700 * 1.43TFlops = 1PFlops ) Il numero di schede GPU del sistema potrà dipendere dal costo o dalla geometria della configurazione in termini di numero di nodi solo CPU e numero di nodi CPU+GPU Tier 1 CINECA

21 Requisiti di alto livello del sistema Vendor identificati: IBM, Eurotech DRAM Memory: 1GByte/core Verrà richiesta la possibilità di avere un sottoinsieme di nodi con una quantità di memoria più elevata Memoria non volatile locale: >500GByte SSD/HD a seconda del costo e dalla configurazione del sistema Cooling: sistema di raffreddamento a liquido con opzione di free cooling Spazio disco scratch: >300TByte (provided by CINECA) Tier 1 CINECA

22 Thank you


Download ppt "Trends and Perspectives for HPC infrastructures Carlo Cavazzoni, CINECA."

Similar presentations


Ads by Google