Presentation on theme: "Andrew Canning and Lin-Wang Wang Computational Research Division LBNL"— Presentation transcript:
1Andrew Canning and Lin-Wang Wang Computational Research Division LBNL Scaling First Principles Materials and Nanoscience Codes to Thousands of Processors (and Thousands of Atoms)(Plane-wave, DFT codes)Andrew Canning and Lin-Wang WangComputational Research DivisionLBNL
2Nanostructures as a new material Definition: Nanostructure is an assembly ofnanometer scale “building blocks”.Why nanometer scale: This is the scale when theproperties of these “building blocks” become differentfrom bulk.sizeElectron WavefunctionNanostructureBoth are in nanometers
3Example: Quantum Dots (QD) CdSe Band gap increaseCdSe quantum dot (size)Single electron effectson transport (Coulombblockade).Mechanical properties,surface effects and nodislocations
4Computational challenges (larger nanostructures) atomsmoleculesnanostructuresbulkInfinite(1-10 atomsin a unit cell)1-100atoms^6atomssizeAb initio methodPARATECAb initioMethodPARATECChallenge forcomputationalnanoscience.methodEffective massmethodAb initioelementsand reliabilityNew methodologyand algorithm(ESCAN)Even largerSupercomputer
51. Plane-wave expansion for Plane-wave Pseudopotential Method in DFTSolve Kohn-Sham Equations self-consistently for electron wavefunctions1. Plane-wave expansion for2. Replace “frozen” core by a pseudopotentialDifferent parts of the Hamiltonian calculated in different spaces (fourier and real) 3d FFT used
6Parallel Algorithm Details Divide sphere of plane-waves in columns between processors (planes of grid in real space)Parallel 3d FFT used to move between real space and fourier spaceFFT requires global communications data packet size ~ 1/(# processors)^2
7Parallel 3D FFT 3D FFT done via 3 sets of 1D FFTs and 2 transposes (b)3D FFT done via 3 sets of 1D FFTs and 2 transposesMost communication in global transpose (b) to (c) little communication (d) to (e)Flops/Comms ~ logNMany FFTs done at the same time to avoid latency issuesOnly non-zero elements communicated/calculatedMuch faster than vendor supplied 3D-FFT(c)(d)(e)(f)
8How to scale up the parallel 3d FFTs Minimize the global communications partLatency problem: Use all-band method in conjunction with many FFTs at the same time to make data packets larger (new all band CG method for metals )FFT part scales as N logN while other parts scale as N (larger systems scale better)23
9Materials/Nanoscience codes Parallel Fourier techniques (many-band approach) used in the following codes:PARATEC PARAllel Total Energy Code (plane-wave pseudopotential code)PEtot (plane-wave ultrasoft pseudopotential code)ESCAN (Energy SCAN) Uses folded spectrum method for non-selfconsistent nanoscale calculations with plane-waves for larger systems.
10PARATEC (PARAllel Total Energy Code) PARATEC performs first-principles quantum mechanical total energy calculation using pseudopotentials & plane wave basis setDesigned to run on large parallel machines IBM SP etc. but also runs on PCsPARATEC uses all-band CG approach to obtain wavefunctions of electronsGenerally obtains high percentage of peak on different platformsDeveloped with Louie and Cohen’s groups (UCB, LBNL), Raczkowski (Multiple 3d FFTs Peter Haynes and Michel Cote)DFT has become standard technique in Mat Sci to calculate the structural and electronic properties of new materials with a full QUANTUM MECHANICAL treatment of the electrons.Picture shows induced current and charge density in crystalized glcine.The simulations used to better undersand nuclear magnetic resonance.
11432 Si-atom system with 5CG steps PARATEC: PerformancePIBM SP Power 3IBM SP Power4SGI AltixNEC ESCRAY X1Gflops/P%peak320.95063%2.0239%3.7162%4.7660%3.0424%640.84857%1.7333%3.2454%4.6759%2.5920%1280.73949%1.5029%4.741.9115%2560.57238%1.0821%4.1752%5120.41328%3.3942%10242.0826%432 Si-atom system with 5CG steps432 Atom Si Bulk System 5 CG stepFirst time code achieved over 1 Tflop Aggregate 2.6 TFlops for 686 Si atomPrevious best was .7 TFlop on Power3 using 1500 procsWork carried out with L. Oliker (CRD) J.T. Carter (NERSC)
12PARATEC: Performance on ES SX6 NEC SX6 ESCRAY X1Gflops/PGflops%peak645.2533566%3.7323929%1284.9563362%3.0138524%2564.59117657%5123.76192447%10242.53259132%Scaling Issues 686 atom Si% time in communications increasing (3D FFT)Fewer number of multiple 1D FFTs for each proc. (vector length drops)Matrices for BLAS3 becoming smaller (vector length drops)
13PARATEC: Scaling ES vs. Power3 ES can run the same system about 10 times faster than the IBM SP (on any number of processors)Main advantage of ES for these types of codes is the fast communication networkFast processors require less fine-grain parallelism in code to get same performance as RISC machinesQD is 309 atom CdSe Quantum Dot
14Applications: Free standing quantum dots (CdSe) TEM imageChemically synthesised (Alivisatos, UCB, LBNL)Interior atoms are in bulk crystal structureSurface atoms are passivatedDiameter ~ AA few thousand atoms, beyond ab initio method
15CdSe quantum dots as biological tags Optically more stable than dye moleculesCan have multiple colors
16Computational challenges (larger nanostructures) atomsmoleculesnanostructuresbulkInfinite(1-10 atomsin a unit cell)1-100atoms^6atomssizeAb initio methodPARATECAb initioMethodPARATECChallenge forcomputationalnanoscience.methodEffective massmethodAb initioelementsand reliabilityNew methodologyand algorithm(ESCAN)Even largerSupercomputer
17Charge patching method for larger systems (L-W. Wang) Non-selfconsistent LDAquality potential fornanotubeSelfconsistent LDAcalculation of a singlegraphite sheetGet information from smallsystem ab initio calc., then generatethe charge densities for large systems
181. Plane-wave expansion for Plane-wave Pseudopotential Method in DFTSolve Kohn-Sham Equations non-selfconsistently for electron wavefunctions in desired energy range using patched charge density (can study larger nanosystems 10,000 atoms)1. Plane-wave expansion for2. Replace “frozen” core by a pseudopotentialDifferent parts of the Hamiltonian calculated in different spaces (fourier and real) 3d FFT used
19Motif based charge patching method Error: 1%, ~20 meV eigen energy error.
26AcknowledgementsThis research was in part funded by the DOE as part of the Modeling and Simulation in Nanoscience Initiative (MICS and BES)This research used resources of NERSC at LBNL ad CCS at ORNL supported under contract No DE-AC03-76SF00098 and DE-AC05-00OR22725.The authors were in part supported by the Office of Advanced Scientific Computing Research in the DOE Office of Science under contract number DE-AC03-76SF00098A. Canning would like to thank the staff of the Earth Simulator Center, especially Dr. T. Sato, S. Kitawaki Y. Tsuda and D. Parks, J. Snyder (NEC, USA) for their assistance during his visit.