Presentation is loading. Please wait.

Presentation is loading. Please wait.

DFT requirements for leadership-class computers N. Schunck Department of Physics  Astronomy, University of Tennessee, Knoxville, TN-37996, USA Physics.

Similar presentations


Presentation on theme: "DFT requirements for leadership-class computers N. Schunck Department of Physics  Astronomy, University of Tennessee, Knoxville, TN-37996, USA Physics."— Presentation transcript:

1 DFT requirements for leadership-class computers N. Schunck Department of Physics  Astronomy, University of Tennessee, Knoxville, TN-37996, USA Physics Division, Oak Ridge National Laboratory, Oak Ridge, TN-37831, USA http://unedf.org The 3rd LACM-EFES-JUSTIPEN Workshop JIHIR, Oak Ridge National Laboratory, February 23-25, 2009 A. Baran, J. Dobaczewski, J. McDonnell, J. Moré, W. Nazarewicz, N. Nikolov, H. H. Nam, J. Pei, J. Sarich, J. Sheikh, A. Staszczak, M. V. Stoitsov, S. Wild

2 Nuclear DFT: Why supercomputing? 1 Why super-computers: Large-scale problems (LACM): fission, shape coexistence, time-dependent problems Systematic restoration of broken symmetries and correlations “made easy” (QRPA, GCM?, etc.) Optimization of extended functionals on larger sets of experimental data DFT: A global theory Supercomputers: DFT at full power… Ground-state of even nucleus can be computed in a matter of minutes on a standard laptop: why bother with supercomputing? Principle: average out individual degrees of freedom  Treatment of correlations ?  Current lack of quantitative predictions at the ~100 keV level  Extrapolability ? “No limit” theory: from light nuclei to the physics of neutron stars Rich physics Fast and reliable

3 Classes of DFT Solvers 2 1D2D3D r-space 1 mn, 1 core (HFBRAD) 5 hours,70 cores (HFBAX) - HO basis- 2 mn, 1 core (HFBTHO) 5 hours, 1 core (HFODD) Computational package used and developed at ORNL and estimate of the resources needed for a standard HFB calculation Coordinate-space: direct integration of the HFB equations Accurate: provide « exact » result Slow and CPU/memory intensive for 2D-3D geometries Configuration space: expansion of the solutions on a basis (usually HO) Fast and amenable to beyond mean-field extensions Truncation effects: source of divergences/renormalization issues Wrong asymptotic unless different bases are used (WS, PTG, Gamow, etc.) Non-linear integro-differential fixed point problem

4 Recent physics achievements 3 Even-even, odd-even and odd-odd mass tables Nuclear fission Systematics of odd-proton states in odd nuclei Cf. Talks by M. Stoitsov, S. Wild and J. Moré Online resources: http://massexplorer.org/ http://unedf.org/

5 Petascale and beyond 4 Hardware constraints (see R. Lusk and J. Vary’s talks): Many cores (100,000+) stacked into sockets - Currently 4 cores/socket, evolution toward 8 cores/socket and more Small-memory per core (shared memory per socket) Short, crash-prone, expensive runtime Consequences on the architecture of DFT solvers: Optimize time of one HFB calculation: reduce number of iterations, use symmetries smartly by improving/interfacing codes, parallelization, etc. Work on parallel wrapper: load balancing, checkpoints, error control mechanisms, etc.

6 Optimization - Interface HFBTHO/HFODD Restarting HFODD from HFB-THO means: –Tremendous gain in time of calculation –Accrued numerical stability –Taking advantage of existing mass tables Procedure: –Coordinate + phase transformation (both unitary) –Modify HFODD to restart from HFB matrix elements instead of density fields on Gauss-Hermite mesh 5 Interface fulling working for spherical HO bases (precision of restart at 10 -4 - 10 -6 ) Memory issue for deformed bases HFB-THO: Axial Cylindrical coordinates Time-reversal symmetry j-block diagonalization HFODD: symmetry- unrestricted Cartesian coordinates Y-simplex eigenbasis No time-reversal symmetry Full diagonalization

7 6 Optimization – HFODD Profiling Broyden routine: storage of N Broyden fields on 3D Gauss- Hermite mesh Temporary array allocation for HFB matrix diagonalization neutronsprotons Calculations by J. McDonnell Safe limit memory/core on Jaguar/Franklin

8 7 Optimization – HFODD Parallelization M M Two levels of parallelism handled by simple MPI group structure –Nuclear configuration (Z, N, interaction, {Q λμ }, etc.) –HFB solver Standard PBLAS and ScaLAPACK libraries for distributed linear algebra Natural splitting of the HFB matrix (OpenMP): perhaps not scalable enough Splitting: –HFB matrix into N blocks –Eigenfunctions conserve the same N-blocks splitting –Densities must be re-constructed piecewise Challenges –Identify self-contained set of all matrices required for one iteration –Handling of conserved symmetries: give different block structure –Identify and replace all BLAS calls by PBLAS equivalents M M

9 Optimization - Finite-size spin instabilities 8 Response of the nucleus to a perturbation with finite momentum q studied in the RPA theory Channels: scalar-isoscalar, scalar- isovector, vector-isoscalar, vector- isovector, etc. Modern Skyrme functionals are highly- instable with respect to finite-size spin perturbations ! Convergence of the HFB calculation of 100 blocked states in 157-165 Ba Region of instability T. Lesinski et al, Phys. Rev. C 74, 044315 (2006) D. Davesne et al, arXiv:0906.1927 (2009) Warning for next generation of functionals: stability must be assessed !

10 Work in progress - Fission 9 Example of challenges for next generation DFT: microscopic description of nuclear fission Degrees of freedom at the HFB level: deformation, temperature Potential energy surfaces depend critically on interaction/functional and pairing correlations Computational tools – Augmented Lagrangian Method  – Broyden Method  Precision tools – Large bases  – Benchmarks  Distributed computing tools – MPI wrapper  – Load balancing  – Efficient, independent, constraint calculations  Static HFB pre-requisites

11 DFT Computing Infrastructure 10 Interfacing codes Parallelize solver Load balancing

12 11 Deliverables Year 2-3 Have a DFT package combining HFB- THO and HFODD available for large- scale calculations Optimize full diagonalization of “large” (4,000  4,000) matrices in HFODD – Take advantage of N-core architecture – Increase speed for large bases (fission, heavy nuclei) – Overcome current memory limitations Optimize Broyden method (Cf. Jorge’s talk) to improve stability/convergence Papers on odd nuclei: 1.Methodology and Theoretical Models 2.Systematic and comparison with experiment Workplan Year 2-3Current Status Done (for spherical bases) - large-scale calculations up to 14,112 cores (2 hours) Well on target – Parallelization of the HFODD core (PBLAS, ScaLAPACK) – Will solve issues related to speed, memory and precision – Change of iteration cycle: updating HFB matrix elements instead of fields Done - Numerical instabilities of large-scale calculations can be tracked down to physical instabilities built-in current functionals (see Mario’s talk) Delayed by problem of instabilities – Paper 1 ready to be published – Paper 2 in preparation – Additional Paper 3 on finite-size spin instabilities in preparation

13 Work Plan (Year 4) 12 Physics – Optimization of DME-based functionals: genetic algorithm + Argonne optimizer (cf Mario’s talk) – Applications of DME functionals: UNEDF-1 Computing – Implement DME functionals in HFODD (study of time-odd channels) – Complete version 1.0 of parallel HFODD core  Demonstrate efficiency and scalability of the code  First applications: N-dimensional potential energy surface, fission pathways – Improve parallel interface to HFODD:  Optimistic: it should be a good application of ADLB (“moderately long to long” work units of 1-2 hours, little communication).  Realistic: remove the master and have him work like a slave (French revolution spirit) – Replace sequential I/O by parallel I/O for HFODD records (used as checkpoints) Remaining of the year New version of HFODD: HFBTHO interface, shell correction, finite-temperature, Augmented Lagrangian Method, matrix elements mixing, parallel interface, etc. 2 papers on odd nuclei and 1 on spin instabilities in preparation

14 Slide 14 Nuclear Structure and Nuclear Interactions Forefront Questions in Nuclear Science and the Role of High Performance Computing January 26-28, 2009 · Washington, D.C. December 10, 2008 Microscopic Description of Nuclear Fission Scientific and computational challenges Describe dynamics with novel energy functionals and ab initio methods 1)adiabatic approach 2)non-adiabatic/early stochastic 3)full time-dependent dynamics Develop ultra-scale techniques for the description of fission Build a spectroscopic precision nuclear energy density functional Perform constrained minimization on a multi- dimensional potential energy surface Find full spectrum of dense millions-sized matrices Predict half-lives, mass and kinetic energy distribution of fission fragments and fission cross-sections Analyze the fission process through the visualization of time evolution Develop scalable application software for time-dependent many-body dynamics Societal Impact  Nuclear Energy programs  Threat reduction  NNSA Stockpile Stewardship Program Time-dependent many-body dynamics  Low-energy heavy-ion collisions and nucleon- and photon-induced reactions  Neutron star quakes  Vortex dynamics in quantum super-fluids Summary of research direction Expected Scientific and Computational Outcomes Potential impact on Nuclear Science Our Holy Grail…


Download ppt "DFT requirements for leadership-class computers N. Schunck Department of Physics  Astronomy, University of Tennessee, Knoxville, TN-37996, USA Physics."

Similar presentations


Ads by Google