Presentation is loading. Please wait.

Presentation is loading. Please wait.

SciDAC Software Infrastructure for Lattice Gauge Theory Richard C. Brower & Robert Edwards June 24, 2003.

Similar presentations


Presentation on theme: "SciDAC Software Infrastructure for Lattice Gauge Theory Richard C. Brower & Robert Edwards June 24, 2003."— Presentation transcript:

1 SciDAC Software Infrastructure for Lattice Gauge Theory Richard C. Brower & Robert Edwards June 24, 2003

2 K. Wilson (1989 Capri): “ One lesson is that lattice gauge theory could also require a 10 8 increase in computer power AND spectacular algorithmic advances before useful interactions with experiment... ” ab initio Chemistry 1.1930+50 = 1980 2.0.1 flops  10 Mflops 3.Gaussian Basis functions ab initio QCD 1.1980 + 50 = 2030?* 2.10 Mflops  1000 Tflops 3.Clever Collective Variable? vs * Hopefully sooner but need $1/Mflops  $1/Gflops!

3 Sci D A C Through entific iscovery dvanced omputing http://www.lqcd.org/scidac

4 QCD Infrastructure Project Funded 2001-2003 (2005?) QCD Infrastructure Project Funded 2001-2003 (2005?) HARDWARE: – 10+ Tflops each at BNL, FNAL & JLab QCDOC @ BNL (2004), Clusters @ FNAL/JLab (2005-2006) SOFTWARE: – enable US lattice physicists to use the QCDOC @ BNL and Clusters @ FNAL & JLab PHYSICS: – Provide Crucial Lattice “Data” that now dominate some tests of the Standard Model. – Deeper understanding of Field Theory (and even String Theory!)

5 Software Infrastructure Goals: Create a unified software environment that will enable the US lattice community to achieve very high efficiency on diverse multi-terascale hardware. TASKS: LIBRARIES: I. QCD Data Parallel API  QDP II. Optimize Message Passing  QMP III. Optimize QCD Linear Algebra  QLA IV. I/O, Data Files and Data Grid  QIO V. Opt. Physics Codes  CPS/MILC/Croma/etc. VI. Execution Environment  unify BNL/FNAL/JLab TASKS: LIBRARIES: I. QCD Data Parallel API  QDP II. Optimize Message Passing  QMP III. Optimize QCD Linear Algebra  QLA IV. I/O, Data Files and Data Grid  QIO V. Opt. Physics Codes  CPS/MILC/Croma/etc. VI. Execution Environment  unify BNL/FNAL/JLab

6 Participants in Software Project (partial list) * Software Coordinating Committee

7 Lattice QCD – extremely uniform Periodic or very simple boundary conditions SPMD: Identical sublattices per processor Lattice Operator: Dirac operator:

8 Optimised Dirac Operators, Inverters Level 3 QDP (QCD Data Parallel) Lattice Wide Operations, Data shifts Level 2 QMP (QCD Message Passing) QLA (QCD Linear Algebra) Level 1 QIO XML I/O DIME SciDAC Software Structure Exists in C/C++, implemented over MPI, GM, QCDOC, gigE Optimised for P4 and QCDOC Exists in C/C++

9 Overlapping communications and computations C(x)=A(x) * shift(B, + mu): – Send face forward non-blocking to neighboring node. – Receive face into pre-allocated buffer. – Meanwhile do A*B on interior sites. – “Wait” on receive to perform A*B on the face. Lazy Evaluation (C style): Shift(tmp, B, + mu); Mult(C, A, tmp); Data layout over processors

10 QCDOC 1.5 Tflops (Fall 2003) Performance of Dirac Inverters (% peak) – clover Wilson (assembly): 2 4  56%, 4 4  59% – naive Staggered (MILC) 2 4  14%, 4 4  22% (4 4 assembly 38%) – Asqtad Force (MILC) 2 4  3%, 4 4  7% – Asqtad Force (1 st attempt optimize) 4 4  16% as determined by ASIC Simulator with native SciDAC message passing (QMP).

11 Cluster Performance: 2002

12 Future Software Goals Critical needs: – On going Optimization, Testing and Hardening of SciDAC software infrastructure – Leverage SciDAC QCD infrastructure with collaborative efforts with ILDG and SciParC projects – Develop mechanism to maintain distributed software libraries. – Foster an international (Linux style?) development of application code.

13 Message Passing QMP Philosophy: Subset of MPI capability appropriate to QCD Broadcasts, Global reductions, Barrier Minimal copying / DMA where possible Channel-oriented / asynchronous communication Multidirection sends/receives for QCDOC Grid and switch model for node layout Implemented on GM and MPI. gigE nearly completed

14 QMP Simple Example char buf[size]; QMP_msgmem_t mm; QMP_msghandle_t mh; mm = QMP_declare_msgmem(buf,size); mh = QMP_declare_send_relative(mm,+x); QMP_start(mh); // Do computations QMP_wait(mh); Receiving node coordinates with the same steps except mh = QMP_declare_receive_from(mm,-x); Multiple calls

15 Data Parallel QDP/C,C++ API Hides architecture and layout Operates on lattice fields across sites Linear algebra tailored for QCD Shifts and permutation maps across sites Reductions Subsets Entry/exit – attach to existing codes

16 Data-parallel Operations Unary and binary: -a; a-b; … Unary functions: adj(a), cos(a), sin(a), … Random numbers: // platform independent random(a), gaussian(a) Comparisons (booleans) a <= b, … Broadcasts: a = 0, … Reductions: sum(a), … Fields have various types (indices):

17 QDP Expressions Can create expressions QDP/C++ code multi1d u(Nd); LatticeDiracFermion b, c, d; int mu; c[even] = u[mu] * shift(b,mu) + 2 * d; PETE: Portable Expression Template Engine Temporaries eliminated, expressions optimised

18 Linear Algebra Implementation Naïve ops involve lattice temps – inefficient Eliminate lattice temps -PETE Allows further combining of operations (adj(x)*y) Overlap communications/computations // Lattice operation A = adj(B) + 2 * C; // Lattice temporaries t1 = 2 * C; t2 = adj(B); t3 = t2 + t1; A = t3; // Merged Lattice loop for (i =... ;... ;...) { A[i] = adj(B[i]) + 2 * C[i]; }

19 Binary File/Interchange Formats Metadata – data describing data; e.g., physics params Use XML for metadata File formats: – Files mixed mode – XML ascii+binary – Using DIME (similar to e-mail MIME) to package – Use BinX (Edinburgh) to describe binary Replica-catalog web-archive repositories

20 Current Status Releases and documentation http://www.lqcd.org/scidac QMP, QDP/C,C++ in first release Performance improvements/testing underway Porting & development of physics codes over QDP on-going QIO/XML support near completion Cluster/QCDOC Run-time environment in development

21 SciDAC Prototype Clusters Myrinet + Pentium 4 – 48 duals 2.0 GHz P4 @ FNAL (Spring 2002) – 128 single 2.0 GHz P4 @ JLab (Summer 2002) – 128 dual 2.4 GHz P4 @ FNAL (Fall 2002) Gigabit Ethernet Mesh + Pentium 4 – 256 (8x8x4) singles 2.8 GHz P4 @ JLab (Summer 2003) – FPGA NIC for GigE Ethernet @FNAL (Summer 2003) – 256 (?) @ FNAL (Fall 2003?)

22 Cast of Characters Software Committee * : R.Brower (chair), C.DeTar, R.Edwards, D.Holmgren, R.Mawhinney, C.Mendes, C.Watson Additional Software: J.Chen, E.Gregory, J.Hetrick, B.Joó, C.Jung, J.Osborn, K.Petrov, A.Pochinsky, J.Simone et al ( * Minutes and working documents: http://physics.bu.edu/~brower/SciDAC/scc.html) Executive Committee: R. Brower, N. Christ, M. Creutz P. Mackenzie, J. Negele, C. Rebbi, S. Sharpe, R. Suger(chair) C. Watson


Download ppt "SciDAC Software Infrastructure for Lattice Gauge Theory Richard C. Brower & Robert Edwards June 24, 2003."

Similar presentations


Ads by Google