National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Porting, Benchmarking, and Optimizing Computational Material Science Packages on TeraGrid Resources Dodi Heryadi Advanced Application Support Group
Imaginations unbound Top 15 Applications at NCSA based on the # of users Gaussian NAMD CHARMM VASP ABAQUS AMBER ENZO FLUENT MRBAYES GROMACS MOLPRO CACTUS GAMESS FLASH ANSYS [data collected from Oct 1 st 2006 – Dec 31 st 2006]
Imaginations unbound Software Packages commonly used in Bio/Molecular Science and Engineering Gaussian Gamess NWChem Molpro ADF Amber Gromacs CHARMM NAMD DLPOLY LAMMPS CPMD VASP Wien2k SIESTA Abinit CASTEP DMol3
Imaginations unbound Some of the Packages Used in Material Science and Engineering Community VASP CPMD Wien2k SIESTA Abinit CASTEP DMol3
Imaginations unbound Porting, Benchmarking, and Optimizing Computational Material Science packages on the TeraGrid Resources To assist users in selecting the best resources when applying for Allocations To assist users in increasing their productivity using TeraGrid resources
Imaginations unbound First package: VASP (Vienna ab-initio simulation package) a package to perform ab-initio quantum-mechanical molecular dynamics (MD) using pseudopotentials and a plane wave basis set large user base (over 20 research groups/PIs at NCSA alone) can scale to 1,024 processors ( restriction: licensed to individual research groups
Imaginations unbound (old) VASP Benchmarks on NCSA platforms cpus IBM p690 (copper) Xeon Cluster (tungsten) IA-64 Linux Cluster (mercury) 17,8406,1683,939 24,0263,9362,496 41,9512,0771,588 81,0391, Pure Ni, 3x3x3 supercell, 2x2x2 kpoints, 10 electrons GGA pseudopotential Wall Clock Time (in seconds)
Imaginations unbound Porting and Benchmarking VASP on abe and ranger Compilers: Intel 10.1 BLAS and LAPACK Libraries: Intel MKL MPI: MVAPICH Internal FFTW libraries
Imaginations unbound Preliminary Results Mn12-acetate Wall clock time (in seconds) # of coresAbe (ppn=8)Ranger (ppn=16) 1635,83619, ,54711, ,6537, ,6887, (job is still in the queue)7,997
Imaginations unbound Work to do Port and benchmark VASP (and other widely used Computational Material Science packages) on other TG resources Next: kraken Optimize VASP on abe, ranger, kraken, and other TG resources Performance analysis/tools to identify performance bottlenecks Selecting appropriate compiler options for optimal performance Using optimized Math libraries (e.g. Intel FFTW, SCALAPACK, AMD Math Libraries, etc.) Lonnie Crosby (NICS) and Yang Wang (PSC) will be involved in this effort
Imaginations unbound Preliminary Source Level Profiling of VASP on abe with perfsuite ( Module Summary Samples Self % Total % Module % 79.42% /cfs/scratch/users/dodi/vaspbench/perf/vasp % 99.25% /usr/local/mvapich p2patched-intel-ofed-1.2/lib/libmpich.so % 99.85% /usr/local/lib64/tls/libpthread so % % /usr/local/lib64/tls/libc so
Imaginations unbound Preliminary Source Level Profiling of VASP on abe with perfsuite ( File Summary Samples Self % Total % File % 58.69% ?? % 75.87% /u/ncsa/dodi/vaspnew/vasp.4.6/fft3dlib.f % 81.12% /u/ncsa/dodi/vaspnew/vasp.4.6/rmm-diis.f % 85.41% /u/ncsa/dodi/vaspnew/vasp.4.6/nonlr.f % 88.51% /u/ncsa/dodi/vaspnew/vasp.4.lib/dlexlib.f % 91.56% /u/ncsa/dodi/vaspnew/vasp.4.6/hamil.f % 93.86% /u/ncsa/dodi/vaspnew/vasp.4.6/fftmpi.f % 95.90% /u/ncsa/dodi/vaspnew/vasp.4.6/fftmpi_map.f % 96.65% /u/ncsa/dodi/vaspnew/vasp.4.6/dfast.f % 97.30% /u/ncsa/dodi/vaspnew/vasp.4.6/wave.f % 97.90% /u/ncsa/dodi/vaspnew/vasp.4.6/mpi.f % 98.50% /u/ncsa/dodi/vaspnew/vasp.4.6/subrot.f % 98.85% /u/ncsa/dodi/vaspnew/vasp.4.6/us.f90 6
Imaginations unbound Function Summary Samples Self % Total % Function % 10.39% fpassm % 20.08% M_LOOP % 26.77% ipassm % 33.22% __intel_new_memcpy % 38.46% eddrmm % 42.96% MPIDI_CH3I_MRAILI_Get_next_vbuf % 47.30% MPIDI_CH3I_SMP_pull_header % 51.50% MPIDI_CH3I_SMP_read_progress % 55.00% mkl_lapack_dlaebz % 58.09% length % 60.14% raccmu % 62.19% fftwav % 64.04% hamiltmu % 65.88% mkl_blas_mc_zhemv_nb % 67.68% mkl_blas_mc_zgemm_copyac % 69.43% MPIDI_CH3I_SMP_write_progress % 70.98% rpromu % 72.48% map_forward % 73.88% AY16_Loop_M % 75.07% A16X8_N4_Loop_M16
Imaginations unbound Next: More Detailed Performance Analysis mpiP ( -- Lightweight, Scalable MPI Profilinghttp://mpip.sourceforge.net/ 20% of the time spent on MPI TAU (Tuning and Analysis Utilities) (
Imaginations unbound Acknowledgement Rick Kufrin and Rui Liu, NCSA Dave McWilliams, NICS