Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lattice QCD and GPU-s Robert Edwards, Theory Group Chip Watson, HPC & CIO Jie Chen & Balint Joo, HPC Jefferson Lab TexPoint fonts used in EMF. Read the.

Similar presentations


Presentation on theme: "Lattice QCD and GPU-s Robert Edwards, Theory Group Chip Watson, HPC & CIO Jie Chen & Balint Joo, HPC Jefferson Lab TexPoint fonts used in EMF. Read the."— Presentation transcript:

1 Lattice QCD and GPU-s Robert Edwards, Theory Group Chip Watson, HPC & CIO Jie Chen & Balint Joo, HPC Jefferson Lab TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A AA A A A

2 Outline Will describe how: Capability computing + Capacity computing + SciDAC –Deliver science & NP milestones Collaborative efforts involve USQCD + JLab & DOE+NSF user communities 2

3 Hadronic & Nuclear Physics with LQCD Hadronic spectroscopy –Hadron resonance determinations –Exotic meson spectrum (JLab 12GeV ) Hadronic structure –3-D picture of hadrons from gluon & quark spin+flavor distributions –Ground & excited E&M transition form-factors (JLab 6GeV+12GeV+Mainz) –E&M polarizabilities of hadrons (Duke+CERN+Lund) Nuclear interactions –Nuclear processes relevant for stellar evolution –Hyperon-hyperon scattering –3 & 4 nucleon interaction properties [Collab. w/LLNL] (JLab+LLNL) Beyond the Standard Model –Neutron decay constraints on BSM from Ultra Cold Neutron source (LANL) 3

4 Bridges in Nuclear Physics NP Exascale 4

5 Spectroscopy Spectroscopy reveals fundamental aspects of hadronic physics –Essential degrees of freedom? –Gluonic excitations in mesons - exotic states of matter? Status –Can extract excited hadron energies & identify spins, –Pursuing full QCD calculations with realistic quark masses. New spectroscopy programs world-wide –E.g., BES III (Beijing), GSI/Panda (Darmstadt) –Crucial complement to 12 GeV program at JLab. Excited nucleon spectroscopy (JLab) JLab GlueX: search for gluonic excitations. 5

6 USQCD National Effort US Lattice QCD effort: Jefferson Laboratory, BNL and FNAL FNAL Weak matrix elements BNL RHIC Physics JLAB Hadronic Physics SciDAC – R&D Vehicle Software R&D INCITE resources (~20 TF-yr) + USQCD cluster facilities (17 TF-yr): Impact on DOE ’ s High Energy & Nuclear Physics Program 6

7 Gauge Generation: Cost Scaling Cost: reasonable statistics, box size and “physical” pion mass Extrapolate in lattice spacings: 10 ~ 100 PF-yr PF-years State-of-Art Today, 10TF-yr 2011 (100TF-yr) 7

8 Computational Requirements Gauge generation : Analysis Current calculations Weak matrix elements: 1 : 1 Baryon spectroscopy: 1 : 10 Nuclear structure: 1 : 4 Computational Requirements: Gauge Generation : Analysis 10 : 1 (2005) 1 : 3 (2010) Core work: Dirac inverters - use GPU-s 8

9 SciDAC Impact Software development –QCD friendly API’s and libraries: enables high user productivity –Allows rapid prototyping & optimization –Significant software effort for GPU-s Algorithm improvements –Operators & contractions: clusters (Distillation: PRL (2009)) –Mixed-precision Dirac-solvers: INCITE+clusters+GPU-s, 2-3X –Adaptive multi-grid solvers: clusters, ~8X (?) Hardware development via USQCD Facilities –Adding support for new hardware –GPU-s 9

10 Modern GPU Characteristics Hundreds of simple cores: high flop rate SIMD architecture (single instruction, multiple data) Complex (high bandwidth) memory hierarchy Gaming cards: no memory Error-Correction (ECC) – reliability issue I/O bandwidth << Memory bandwidth Commodity Processorsx86 CPUNVIDIA GT200New Fermi GPU #cores8240480 Clock speed3.2 GHz1.4 GHz Main memory bandwidth20 GB/s160 GB/s (gaming card) 180 GB/s (gaming card) I/O bandwidth7 GB/s (dual QDR IB) 3 GB/s 4 GB/s Power80 watts200 watts250 watts 10

11 Inverter Strong Scaling: V=32 3 x256 Local volume on GPU too small (I/O bottleneck) 3 Tflops 11

12 Science / Dollar for (Some) LQCD Capacity Apps 12

13 Hardware: ARRA GPU Clusters GPU clusters: ~530 cards Quads 2.4 GHz Nehalem 48 GB memory / node 117 nodes x 4 GPUs -> 468 GPUs Singles 2.4 GHz Nehalem 24 GB memory / node 64 nodes x 1 GPU -> 64 GPUs

14 530 GPUs at Jefferson Lab (July)  200,000 cores (1,600 million core hours / year)  600 Tflops peak single precision  100 Tflops aggregate sustained in the inverter, (mixed half / single precision)  Significant increase in dedicated USQCD resources All this for only $1M with hosts, networking, etc. Disclaimer: To exploit this performance, code has to be run on the GPUs, not the CPU (Amdahl’s Law problem). SciDAC-2 (& 3) software effort: move more inverters & other code to gpu A Large Capacity Resource 14

15 New Science Reach in 2010-2011 QCD Spectrum Gauge generation: (next dataset) –INCITE: Crays&BG/P-s, ~ 16K – 24K cores –Double precision Analysis (existing dataset): two-classes –Propagators (Dirac matrix inversions) Few GPU level Single + half precision No memory error-correction –Contractions: Clusters: few cores Double precision + large memory footprint Cost (TF-yr) New: 10 TF-yr Old: 1 TF-yr 10 TF-yr 1 TF-yr 15

16 Isovector Meson Spectrum 16

17 Isovector Meson Spectrum 17 1004.4930

18 Exotic matter Exotics: world summary 18

19 Exotic matter Suggests (many) exotics within range of JLab Hall D Previous work: photo- production rates high Current GPU work: (strong) decays - important experimental input Exotics: first GPU results 19

20 Nucleon & Delta Spectrum First results from GPU-s < 2% error bars [ 56,2 + ] D-wave [ 70,1 - ] P-wave [ 70,1 - ] P-wave [ 56,2 + ] D-wave Discern structure: wave-function overlaps Change at light quark mass? Decays! Suggests spectrum at least as dense as quark model 20

21 Extending science reach USQCD: –Next calculations: physical quark masses: 100 TF – 1 PF-yr –New INCITE+Early Science application (ANL+ORNL+NERSC) –NSF Blue Waters Petascale (PRAC) Need SciDAC-3 –Significant software effort for next generation GPU-s & heterogeneous environments –Participate in emerging ASCR Exascale initiatives INCITE + LQCD synergy: –ARRA GPU system well matched to current leadership facilities 21

22 Path to Exascale Enabled by some hybrid GPU system? –Cray + Nvidia ?? NSF GaTech: Tier 2 (experimental facility) –Phase 1: HP cluster+GPU (Nvidia Tesla) –Phase 2: hybrid GPU+ ASCR Exascale facility –Case studies for Science, Software+Runtime, Hardware ASCR call proposals: Exascale Co-Design Center Exascale capacity resource will be needed 22

23 Summary Capability + Capacity + SciDAC –Deliver science & HEP+NP milestones Petascale (leadership) + Petascale (capacity)+SciDAC-3 Spectrum + decays First contact with experimental resolution Exascale (leadership) + Exascale (capacity)+SciDAC-3 Full resolution Spectrum + transitions Nuclear structure Collaborative efforts: USQCD + JLab user communities 23

24 Backup slides The end 24

25 JLab ARRA: Phase 1 25

26 JLab ARRA: Phase 2 26


Download ppt "Lattice QCD and GPU-s Robert Edwards, Theory Group Chip Watson, HPC & CIO Jie Chen & Balint Joo, HPC Jefferson Lab TexPoint fonts used in EMF. Read the."

Similar presentations


Ads by Google