Presentation on theme: "BU SciDAC Meeting Balint Joo Jefferson Lab. Anisotropic Clover Why do it ? Anisotropy -> Fine Temporal Lattice Spacing at moderate cost Combine with."— Presentation transcript:
BU SciDAC Meeting Balint Joo Jefferson Lab
Anisotropic Clover Why do it ? Anisotropy -> Fine Temporal Lattice Spacing at moderate cost Combine with Group Theoretical Baryon Operators -> Access to Excited States Nice preliminary results – with just Wilson Excited states States with spin 5/2+
Anisotropic Clover Why do it ? Part of Jlab 3 prong Lattice QCD programme Prong 1: Dynamical Anisotropic Clover Prong 2: DWF on a staggered sea (MILC Configs) Prong 3: Large Scale Dynamical DWF This programme was specially commended by the DOE at our recent Science and Technology Review Anisotropic Clover is a major part of the INCITE proposal (for XT3 and BG/?) machines
Anisotropic Clover Level 2 Clover Term and Inverse& Force Term Wired into Chroma -> Provides HMC/RHMC Our Choice of Gauge Action: Plaquette + Rectangle + Adjoint Term Fermion Action Anisotropic Clover + Stout Smearing Stout Force Recursion Usual Barrage of DF techniques Hasenbusch + Chronology for 2 flavours RHMC for the +1 flavour Multi time scale integrators
CG Inverter Performance We only got 7.3Tflops on 8K CPUs :( - but we didn't work much at all at optimzation
Clover Work Under SciDAC 2 Performance is OK but want better... Optimizations Clover SSE Optimizations for Clusters & XT3 BAGEL terms for BG/??? Multi Mass Inverter, Trace Terms Would like to optimize the actual bottleneck CG Inverter is not the current bottleneck Help from our friends at RENCI at identifying the exact hotspots? (Right now we rely on gprof) Algorithmic: Temporal Preconditioning ('later)
Thoughts at the back of my mind Are we actually going to get any time at ORNL? We asked for a lot I think 20M CPU hours just for the clover stuff Incite proposal was extremely hurried We had to respond very quickly Many small groups did not have (stand?) a chance How much effort should we be investing? Should we be focusing on BlueGene/? and clusters more?
CRE and ILDG Progress on CRE has been slow. Why? Manpower reasons in SciDAC 1? People are happily running production already without it? In which case is it just LOW VALUE? where are the 'armies of new users' who need it? What are the issues? Intimately tied to infrastructure at each site. site infrastructure leverages off experiments different everywhere High Maintenance PBS, LoadLeveller, NSF? dcache anyone? upgrade of mvapich, OpenMPI, IB fabric etc Inherently non portable (what about ANL/ORNL)
CRE and ILDG If it has low value, no user demand and is high maintenance and won't work outsideour sites.... is it worth doing? can we just drop it ? PLEASE? Anyway common environments are so passe and 90s. Nowadays we should think about 'interoperable grid environments' – they're IN!
ILDG Middleware Progressed but still on eXist MDC dumb RC: (just remap the LFN to a FNAL dcache name) Issues: Where is all the markup ? Eventually need more sophisticated RC ? Markup is NOT anisotropy aware (future fights in the MDWG – will take time) working towards interoperability Meeting at JlLab Dec Can folks from BNL and FNAL come?
Testing and Release Unit Testing v.s. End to End Testing Too much existing code We intermix QMP, QDP++, QIO, XpathReader, LIME, Chroma, Wilson Dslash or BAGEL Dslash, possibly BAGEL linear algebra, level 3 CG-DWF Unit testing all of these is difficult End to End Tests: Compare the final result eg: correlation functions Lots of output – selective diffs? QDP++ Uses XML, Selective Diffs through XMLDiff
Structure Test Consists of Executable, Input XML, Expected Output XML Metric file to decide which bits of the Output we need to check Runner – abstract away running Trivial Runner (just re-echoes your commands) MPIRUN runner (runs on 2 Jlab IB nodes) prototype YOD runner (for XT3) LoadLeveller runner (for BG/L) – yucky Driver Scripts run interactively (eg scalar targets) & check submit jobs to a queue, check later (for queues)
What has testing taught us? We run through this regression framework nightly: gcc3,gcc4, scalar, parscalar-ib What runs fine with gcc3.x on RHEL won't necessarily run fine with gcc4.x on FC5 Maintenance: Keep up with compilers – identify problems ICC – catastrophic error: can't allocate register (SSE inline) VACPP (XLC) – 'Internal Compiler error: Please contact IBM representative' on templates PGI: No inline assembler? intrinsics? we really MUST focus on this issue or will it be GCC 3.4.x forever (seems most stable so far)
SciDAC Release Pages? What's the actual problem here? Jlab page has releases that live in the JLAB CVS release directory previous versions (by vox populi) We strive to keep the pages up to date Not everyone uses Jlab CVS. Why? do you prefer to run your own repository? do you you want to use Subversion? do you think only sissies use version control? Centralizing release management is bad imagine if I had to be responsible for the release of a code that I myself could only pick up by web page? Is it only John Kogut who is unhappy?
A possible solution to the problem which may or may not exist A SourceForge like setup (Gforge) Provides Per Project Web-Space, Release Tarball Space Source Code Management Modules (CVS & SVN) May be able to 'proxy' for your own repo. Mailing Lists, Bugtracker, Newsfeeds yadda yadda Wiki like authentication Our new Sysadmins are installing this at JLAB But all the effort is wasted if folks don't use it...