Presentation is loading. Please wait.

Presentation is loading. Please wait.

QDP++ and Chroma Robert Edwards Jefferson Lab Collaborators: Balint Joo.

Similar presentations


Presentation on theme: "QDP++ and Chroma Robert Edwards Jefferson Lab Collaborators: Balint Joo."— Presentation transcript:

1 QDP++ and Chroma Robert Edwards Jefferson Lab Collaborators: Balint Joo

2 Lattice QCD – extremely uniform Periodic or very simple boundary conditions SPMD: Identical sublattices per processor Lattice Operator: Dirac operator:

3 Software Infrastructure Goals: Create a unified software environment that will enable the US lattice community to achieve very high efficiency on diverse multi-terascale hardware. TASKS: LIBRARIES: I. QCD Data Parallel API  QDP II. Optimize Message Passing  QMP III. Optimize QCD Linear Algebra  QLA IV. I/O, Data Files and Data Grid  QIO V. Opt. Physics Codes  CPS/MILC/Croma/etc. VI. Execution Environment  unify BNL/FNAL/JLab TASKS: LIBRARIES: I. QCD Data Parallel API  QDP II. Optimize Message Passing  QMP III. Optimize QCD Linear Algebra  QLA IV. I/O, Data Files and Data Grid  QIO V. Opt. Physics Codes  CPS/MILC/Croma/etc. VI. Execution Environment  unify BNL/FNAL/JLab

4 Data layout over processors Overlapping communications and computations C(x)=A(x) * shift(B, + mu): – Send face forward non-blocking to neighboring node. – Receive face into pre-allocated buffer. – Meanwhile do A*B on interior sites. – “Wait” on receive to perform A*B on the face. Lazy Evaluation (C style): Shift(tmp, B, + mu); Mult(C, A, tmp);

5 Optimised Dirac Operators, Inverters Level 3 QDP (QCD Data Parallel) Lattice Wide Operations, Data shifts Level 2 QMP (QCD Message Passing) QLA (QCD Linear Algebra) Level 1 QIO XML I/O LIME SciDAC Software Structure Exists, implemented in MPI, GM, gigE and QCDOC Wilson Op, DWF Inv for P4; Wilson and Stag. Op for QCDOC Exists in C, C++, scalar and MPP using QMP

6 QMP Simple Example char buf[size]; QMP_msgmem_t mm; QMP_msghandle_t mh; mm = QMP_declare_msgmem(buf,size); mh = QMP_declare_send_relative(mm,+x); QMP_start(mh); // Do computations QMP_wait(mh); Receiving node coordinates with the same steps except mh = QMP_declare_receive_from(mm,-x); Multiple calls

7 Data Parallel QDP/C,C++ API Hides architecture and layout Operates on lattice fields across sites Linear algebra tailored for QCD Shifts and permutation maps across sites Reductions Subsets Entry/exit – attach to existing codes

8 QDP++ Type Structure Lattice Fields have various kinds of indices Color: U ab (x) Spin:    Mixed:   a (x), Q ab  (x) Tensor Product of Indices forms Type:  Gauge Fields: Fermions: Scalars: Propagators: Gamma: Lattice Scalar Lattice Scalar Matrix(Nc) Vector(Nc) Scalar Matrix(Nc) Scalar Vector(Ns) Scalar Matrix(Ns) Complex Scalar Complex   Lattice Color Spin Complexity QDP++ forms these types via nested C++ templating Formation of new types (eg: half fermion) possible

9 Data-parallel Operations Unary and binary: -a; a-b; … Unary functions: adj(a), cos(a), sin(a), … Random numbers: // platform independent random(a), gaussian(a) Comparisons (booleans) a <= b, … Broadcasts: a = 0, … Reductions: sum(a), … Fields have various types (indices): Tensor Product

10 QDP Expressions Can create expressions QDP/C++ code multi1d u(Nd); LatticeDiracFermion b, c, d; int mu; c[even] = u[mu] * shift(b,mu) + 2 * d; PETE: Portable Expression Template Engine Temporaries eliminated, expressions optimised

11 Linear Algebra Implementation Naïve ops involve lattice temps – inefficient Eliminate lattice temps -PETE Allows further combining of operations (adj(x)*y) Overlap communications/computations Full performance – expressions at site level // Lattice operation A = adj(B) + 2 * C; // Lattice temporaries t1 = 2 * C; t2 = adj(B); t3 = t2 + t1; A = t3; // Merged Lattice loop for (i =... ;... ;...) { A[i] = adj(B[i]) + 2 * C[i]; }

12 QDP++ Optimization Optimizations “under the hood” Select numerically intensive operations through template specialization. PETE recognises expression templates like: z = a * x + y from type information at compile time. Calls machine specific optimised routine (axpyz) Optimized routine can use assembler, reorganize loops etc. Optimized routines can be selected at configuration time, Unoptimized fallback routines exist for portability

13 Performance Test Case - Wilson Conjugate Gradient LatticeFermion psi, p, r; Real c, cp, a, d; Subset s; for(int k = 1; k <= MaxCG; ++k) { // c = | r[k-1] |**2 c = cp; // a[k] := | r[k-1] |**2 / // Mp = M(u) * p M(mp, p, PLUS); // Dslash // d = | mp | ** 2 d = norm2(mp, s); a = c / d; // Psi[k] += a[k] p[k] psi[s] += a * p; // r[k] -= a[k] M^dag.M.p[k] ; M(mmp, mp, MINUS); r[s] -= a * mmp; cp = norm2(r, s); if ( cp <= rsd_sq ) return; // b[k+1] := |r[k]|**2 / |r[k-1]|**2 b = cp / c; // p[k+1] := r[k] + b[k+1] p[k] p[s] = r + b*p; } In C++ significant room for perf. degradation Performance limitations in Lin. Alg. Ops (VAXPY) and norms Optimization: Funcs return container holding function type and operands At “=“, replace expression with optimized code by template specialization Performance: QDP overhead ~ 1% peak Wilson: QCDOC 283Mflops/node @350 MHz, 4^4/node VAXPY operations Norm squares

14 Chroma A lattice QCD toolkit/library built on top of QDP++ Library is a module – can be linked with other codes. Features: Utility libraries (gluonic measure, smearing, etc.) Fermion support (DWF, Overlap, Wilson, Asqtad) Applications: Spectroscopy, Props & 3-pt funcs, eigenvalues Heatbath, HMC Optimization hooks – level 3 Wilson-Dslash for Pentium, QCDOC, BG/L, IBM SP-like nodes (via Bagel) eg: McNeile: computes propagators with CPS, measure pions with Chroma all in same code

15 Software Map Features: Show dir structure

16 Chroma Lib Structure Chroma Lattice Field Theory library Support for gauge and fermion actions –Boson action supportBoson action support –Fermion action supportFermion action support Fermion actions Fermion boundary conditions Inverters Fermion linear operators Quark propagator solution routines –Gauge action supportGauge action support Gauge actions Gauge boundary conditions IO routines –EnumsEnums Measurement routines –Eigenvalue measurementsEigenvalue measurements –Gauge fixing routinesGauge fixing routines –Gluonic observablesGluonic observables –Hadronic observablesHadronic observables –Inline measurementsInline measurements Eigenvalue measurements Glue measurements Hadron measurements Smear measurements –Psibar-psi measurementsPsibar-psi measurements –Schroedinger functionalSchroedinger functional –Smearing routinesSmearing routines –Trace-log supportTrace-log support Measurement routines – –Eigenvalue measurementsEigenvalue measurements – –Gauge fixing routinesGauge fixing routines – –Gluonic observablesGluonic observables – –Hadronic observablesHadronic observables – –Inline measurementsInline measurements Eigenvalue measurements Glue measurements Hadron measurements Smear measurements – –Psibar-psi measurementsPsibar-psi measurements – –Schroedinger functionalSchroedinger functional – –Smearing routinesSmearing routines – –Trace-log supportTrace-log support Gauge field update routines – –HeatbathHeatbath – –Molecular dynamics supportMolecular dynamics support Hamiltonian systems HMC trajectories HMD integrators HMC monomials HMC linear system solver initial guess Utility routines – –Fermion manipulation routinesFermion manipulation routines – –Fourier transform supportFourier transform support – –Utility routines for manipulating color matricesUtility routines for manipulating color matrices – –Info utilitiesInfo utilities

17 Fermion Actions  Actions are factory objects (foundries)  Do not hold gauge fields – only params  Factory/creation functions with gauge field argument  Takes a gauge field - creates a State & applies fermion BC.  Takes a State – creates a Linear Operator (dslash)  Takes a State – creates quark prop. solvers  Linear Ops are function objects  E.g., class Foo {int operator() (int x);} fred; // int z=fred(1);  Argument to CG, MR, etc. – simple functions  Created with XML

18 Fermion Actions - XML  Tag FermAct is key in lookup map of constructors  During construction, action reads XML  FermBC tag invokes another lookup WILSON 0.11 SIMPLE_FERMBC 1 1 1 -1 false 3 1.0 XPath used in chroma/mainprogs/main/propagator.cc /propagator/Params/FermionAction/FermAct

19 HMC and Monomials  HMC built on Monomials  Monomials define N f, gauge, etc.  Only provide Mom à deriv(U) and S(U). Pseudoferms not visible.  Have N f =2 and rational N f =1  Both 4D and 5D versions. TWO_FLAVOR_WILSON_FERM_MONOMIAL WILSON …… CG_INVERTER 1.0e-7 1000 MINIMAL_RESIDUAL_EXTRAPOLATION_4D_PREDICTOR 7 ….

20 Gauge Monomials Gauge monomials: Plaquette Rectangle Parallelogram Monomial constructor will invoke constructor for Name in GaugeAction …. WILSON_GAUGEACT_MONOMIAL WILSON_GAUGEACT 5.7 PERIODIC_GAUGEBC

21 Chroma – Inline Measurements HMC has Inline meas. Chroma.cc is Inline only code. Former mainprogs now inline meas. Meas. are registered with constructor call. Meas. given gauge field – no return value. Only communicate to each other via disk (maybe mem. buf.??) MAKE_SOURCE..../source_0 MULTIFILE PROPAGATOR..../source_0./propagator_0 MULTIFILE ….

22 Binary File/Interchange Formats Metadata – data describing data; e.g., physics params Use XML for metadata File formats: – Files mixed mode – XML ascii+binary – Using DIME (similar to e-mail MIME) to package – Use BinX (Edinburgh) to describe binary Replica-catalog web-archive repositories

23 For More Information U.S. Lattice QCD Home Page: http://www.usqcd.org/ The JLab Lattice Portal http://lqcd.jlab.org/ http://lqcd.jlab.org/ High Performance Computing at JLab http://www.jlab.org/hpc/


Download ppt "QDP++ and Chroma Robert Edwards Jefferson Lab Collaborators: Balint Joo."

Similar presentations


Ads by Google