Presentation is loading. Please wait.

Presentation is loading. Please wait.

UNEDF 2011 ANNUAL/FINAL MEETING Progress report on the BIGSTICK configuration-interaction code Calvin Johnson 1 Erich Ormand 2 Plamen Krastev 1,2,3 1 San.

Similar presentations


Presentation on theme: "UNEDF 2011 ANNUAL/FINAL MEETING Progress report on the BIGSTICK configuration-interaction code Calvin Johnson 1 Erich Ormand 2 Plamen Krastev 1,2,3 1 San."— Presentation transcript:

1 UNEDF 2011 ANNUAL/FINAL MEETING Progress report on the BIGSTICK configuration-interaction code Calvin Johnson 1 Erich Ormand 2 Plamen Krastev 1,2,3 1 San Diego State University, 2 Lawrence Livermore Lab, 3 Harvard University Supported by DOE Grants DE-FG02-96ER40985,DE-FC02-09ER41587, and DE-AC52-07NA27344

2 UNEDF 2011 ANNUAL/FINAL MEETING We have good news and bad news.......both the same thing........the postdoc (Plamen Krastev) got a permanent staff position in scientific computing at Harvard.

3 BIGSTICK:  General purpose M-scheme configuration interaction (CI) code  On-the-fly calculation of the many-body Hamiltonian  Fortran 90, MPI and OpenMP  35,000+ lines in 30+ files and 200+ subroutines  Faster set-up  Faster Hamiltonian application  Rewritten for “easy” parallelization  New parallelization scheme REDSTICKBIGSTICK 2

4 BIGSTICK:  Flexible truncation scheme: handles ‘no core’ ab initio Nhw truncation, valence-shell (sd & pf shell) orbital truncation; np-nh truncations; and more.  Applied to ab initio calculations, valence shell calculations (in particular level densities, random interaction studies, and benchmarking projected HF), cold atoms, and electronic structure of atoms (benchmarking RPA and HF for atoms). REDSTICKBIGSTICK 2 Version 6.5 is available at NERSC: unedf/lcci/BIGSTICK/v650/

5 BIGSTICK uses factorization algorithm reduces storage of Hamiltonian arrays 5 NuclideSpaceBasis dimmatrix storefactorization 56 Fepf501 M290 Gb0.72 Gb 7 LiN max =12252 M3600 Gb96 Gb 7 LiN max =141200 M23 Tb624 Gb 12 CN max =632M196 Gb3.3 Gb 12 CN max =8590M5000 Gb65 Gb 12 CN max =107800M111 Tb1.4 Tb 16 ON max =626 M142 Gb3.0 Gb 16 ON max =8990 M9700 Gb130 Gb Comparison of nonzero matrix storage with factorization TRIUMF – Feb 2011 UNEDF 2011 ANNUAL/FINAL MEETING

6 BIGSTICK: 2 Micah Schuster, Physics MS project

7 BIGSTICK: 2 Joshua Staker, Physics MS project

8 BIGSTICK: 2

9 2

10 3

11 BIGSTICK 3

12 UNEDF 2011 ANNUAL/FINAL MEETING Major accomplishment as of last year: excellent scaling of mat-vec multiply This demonstrates our factorization algorithm, as predicted, facilitates efficient distribution of mat-vec ops This demonstrates our factorization algorithm, as predicted, facilitates efficient distribution of mat-vec ops

13 Major accomplishments after last UNEDF meeting:  Rebalanced workload with additional constraint for dimension of local Lanczos vectors (Krastev)  Fully distributed Lanczos vectors with hermiticity on (Krastev)  Major steps towards distributing Lanczos vectors with suppressed hermiticity (Krastev)  OpenMP implementations in matrix-vector multiply (Ormand & Johnson)  Significant progress in 3-body implementation (Johnson & Ormand)  Added restart option (Johnson)  Implemented in-lined 1-body density matrices (Johnson) 6

14 UNEDF 2011 ANNUAL/FINAL MEETING Highlighting accomplishments for 2010-2011: Add OpenMP Reduce memory load/ node -- Lanczos vectors -- matrix information (matrix elements/jumps) Speed up reorthogonalization -- I/O is bottleneck

15 UNEDF 2011 ANNUAL/FINAL MEETING Highlighting accomplishments for 2010-2011: Add OpenMP -- Crude 1 st generation by Johnson (about 70-80% efficiency) -- 2 nd generation by Ormand (nearly 100% efficiency) Hybrid OpenMP+MPI implemented, full testing delayed due to reorthogonalization issues

16 UNEDF 2011 ANNUAL/FINAL MEETING Highlighting accomplishments for 2010-2011: Add OpenMP Reduce memory load/ node -- Lanczos vectors -- matrix information (matrix elements/jumps) We break up the Lanczos vectors so only part on each node Future: separate forward/backward multiplication

17 Vin 1 2 3 4 Vout 1 2 3 4 11 22 Proton sectorNeutron sector Lanczos vectors distribution: 22

18 Vin 1 2 3 4 Vout 1 2 3 4 11 22 Proton sectorNeutron sector Lanczos vectors distribution: Hermiticity on Forward and … 22

19 Vin 1 2 3 4 Vout 1 2 3 4 11 22 Proton sectorNeutron sector Lanczos vectors distribution: Hermiticity on Forward and … … backward application of H 22

20 Vin 1 2 3 4 Vout 1 2 3 4 11 22 Proton sectorNeutron sector Lanczos vectors distribution: Hermiticity on Each compute node needs at a minimum TWO sectors from initial and TWO sectors from final Lanczos vector Forward and … … backward application of H 22

21 Vin 1 2 Vout 1 2 Lanczos vectors distribution: Hermiticity off 11 22 Proton sectorNeutron sector Forward application of H on one node and … 23

22 Vin 1 2 Vout 1 2 Lanczos vectors distribution: Hermiticity off 11 22 Proton sectorNeutron sector Forward application of H on one node and … … backward application of H on another node 11 22 1 2 1 2 23

23 Vin 1 2 Vout 1 2 Lanczos vectors distribution: Hermiticity off 11 22 Proton sectorNeutron sector Forward application of H on one node and … … backward application of H on another node 11 22 1 2 1 2 Each compute node needs ONE sector from initial and ONE sector from final Lanczos vector 23

24 Comparison of memory requirements for distributing Lanczos vectors: NuclideSpaceBasis dimStoreHermiticity ON Hermiticity OFF 12 C N max = 10 7800M117GB8.44GB4.39GB 60 Znpf2300M34GB8.65GB4.45GB 24 Memory required to store 2 Lanczos vectors (double precision) on a node

25 Comparison of memory requirements for distributing Lanczos vectors: NuclideSpaceBasis dimStoreHermiticity ON Hermiticity OFF 12 C N max = 10 7800M117GB8.44GB4.39GB 60 Znpf2300M34GB8.65GB4.45GB 24 Memory required to store 2 Lanczos vectors (double precision) on a node Distribution scheme with suppressed hermiticity is the most memory efficient. This is the scheme of choice for us

26 UNEDF 2011 ANNUAL/FINAL MEETING Highlighting accomplishments for 2010-2011: Add OpenMP Reduce memory load/ node -- Lanczos vectors -- matrix information (matrix elements/jumps) Speed up reorthogonalization -- I/O is bottleneck

27 UNEDF 2011 ANNUAL/FINAL MEETING Highlighting accomplishments for 2010-2011: Add OpenMP Reduce memory load/ node -- Lanczos vectors -- matrix information (matrix elements/jumps) Speed up reorthogonalization -- I/O is bottleneck We (i.e. PK) spent time trying to make MPI/IO efficient for our needs via striping, etc. Analysis by Rebecca Hartman-Baker (ORNL) suggests our I/O still running sequentially rather than in parallel. Now we will store all Lanczos vectors in memory a la MFDn (makes restarting an interrupted run difficult)

28 UNEDF 2011 ANNUAL/FINAL MEETING Next steps for remainder of project period: Store Lanczos vectors in RAM (end of summer) Write paper on factorization algorithm (drafted, finish by 9/2011) Fully implement MPI/ OpenMP hybrid code (11/2011) Write up paper for publication of code (early 2012)

29 UNEDF 2011 ANNUAL/FINAL MEETING UNEDF Deliverables for BIGSTICK The LCCI project will deliver final UNEDF versions of LCCI codes, scripts, and test cases will be completed and released. Current version (6.5) at NERSC; expect final version by end of year; plans to publish in CPC or similar venue. Improve the scalability of BIGSTICK CI code up to 50,000 cores. Main barrier was reorthogonalization; now putting Lanczos vectors in memory to minimize I/O Use BIGSTICK code to investigate isospin breaking in pf shell Delayed due to problem with I/O hardware on Sierra

30 UNEDF 2011 ANNUAL/FINAL MEETING SciDAC-3 possible deliverables for BIGSTICK (End of SciDAC-2: 3-body forces on 100,000 cores) Run with 3-body up to 1,000,000 cores on Sequoia, Nmax =10/12 for 12,14 C Add in 4-body forces ; investigate alpha-clustering with effective 4-body forces (via SRG or Lee-Suzuki) Currently interfaces with Navratil’s TRDENS to generate densities, spectroscopic factors, etc, needed for RGM reaction calculations; will improve this: develop fast post-processing with factorization Investigate general unitary-transform effective interactions, adding constraint to observables

31 31 Sample application: cold atomic gases at unitarity in a harmonic trap Using only 1 generator (d/dr) ( very much like UCOM ) Fit to A =3, 1 -, 0 + A = 4, 0 +,1 +, 2 + UNEDF -- MSU June 2010 starting rms = 2.32 final rms = 0.58 UNEDF 2011 ANNUAL/FINAL MEETING

32 Cross-fertilization of LCCI project: BIGSTICK MFDn On-the-fly construction of basis states and matrix elements Reorthogonalization and Lanczos vector management Reorthogonalization and Lanczos vector management NuShellX J-projected basis


Download ppt "UNEDF 2011 ANNUAL/FINAL MEETING Progress report on the BIGSTICK configuration-interaction code Calvin Johnson 1 Erich Ormand 2 Plamen Krastev 1,2,3 1 San."

Similar presentations


Ads by Google