Presentation on theme: "One-day Meeting, INI, September 26th, 2008 Role of spectral turbulence simulations in developing HPC systems YOKOKAWA, Mitsuo Next-Generation Supercomputer."— Presentation transcript:
One-day Meeting, INI, September 26th, 2008 Role of spectral turbulence simulations in developing HPC systems YOKOKAWA, Mitsuo Next-Generation Supercomputer R&D Center RIKEN
One-day Meeting, INI, September 26th, 20081 Background Experience of developing the Earth Simulator 40Tflops vector-type distributed-memory supercomputer system A simulation code for box turbulence flow was used in the final adjustment of the system Large simulation on box turbulence flow was carried out. A Peta-flops supercomputer project
One-day Meeting, INI, September 26th, 20082 Contents Simulations on the Earth Simulator A Japanese peta-scale supercomputer project Trends of HPC system Summary
One-day Meeting, INI, September 26th, 20083 Simulations on the Earth Simulator
One-day Meeting, INI, September 26th, 20084 The Earth Simulator It was completed in 2002. 35.86Tflops sustained in LINPACK benchmark was achieved. It was chosen as one of 2002 best inventions by “TIME.”
One-day Meeting, INI, September 26th, 20085 Why I did? It is important to make performance evaluation of the Earth Simulator at the final adjustment phase. Suitable codes should be chosen To evaluate performance of vector processor, To measure performance all-to-all communication among compute-nodes through a crossbar switch, To make an operation of the Earth Simulator stable. Candidates LINPACK Benchmark? Atmospheric general circulation model (AGCM)? Any other code?
One-day Meeting, INI, September 26th, 20086 Why I did? (cont’d) Spectral turbulence simulation code Intensive computational kernel & a lot of data communications Simple code Significance to computational science. One of the grand challenges in computational science and high performance computing A new spectral code for the Earth Simulator Fourier spectral method for spatial discretization Some techniques (mode truncation and phase shift techniques) for aliasing error in calculating nonlinear terms Fourth-order Runge-Kutta method for time integration
One-day Meeting, INI, September 26th, 20087 Points of coding Optimization to the Earth Simulator Coordinated assignment of calculation to three-level of parallelism (vector processing, micro-tasking, and MPI parallelization) Higher-radix FFT B/F (data transfer rate between CPU and memories vs. operation performance) Removal of redundant processes and variables
One-day Meeting, INI, September 26th, 20088 3.21sec Calculation for one time step Number of nodes Wall time 30.7sec 64128256 512 100 10 1 0.1 0.01 3days by 512 PNs
One-day Meeting, INI, September 26th, 20089 Performance Tflops 16.4Tflops Number of PNs 64128256512 100 10 1 50% of the peak (single precision & analytical FLOP number)
One-day Meeting, INI, September 26th, 200810 Achievement of box turbulence flow simulations 1 10 100 1000 10000 196019701980199020002010 Year Orszag(1969) IBM 360-95 Kerr(1985) Cray-1S NCAR K & I & Y (2002) Earth Simulator 32 3 64 3 128 3 1024 3 2048 3, 4096 3 Number of grid points Yamamoto(1994) Numerical Wind Tunnel Jimenez et al.(1993) Caltech Delta machine 512 3 Siggia(1981) Cray-1 NCAR Gotoh&Fukayama(2001) VPP5000/56 NUCC 240 3
One-day Meeting, INI, September 26th, 200811 A Japanese Peta-Scale Supercomputer Project
One-day Meeting, INI, September 26th, 200812 Next-Generation Supercomputer Project Objectives are to develop the world's most advanced and high-performance supercomputer to develop and deploy its usage technologies as well as application software. as one of Japan's Key Technologies of National Importance. Period & Budget: FY2006-FY2012, ~1 billion US$ (expected) RIKEN (The Institute of Physical and Chemical Research) plays the central role of the project in developing the supercomputer under the law.
One-day Meeting, INI, September 26th, 200813 Goals of the project Development and installation of the most advanced high performance supercomputer system with LINPACK performance of 10 petaflops. Development and deployment of application software, which should be made to attain the system maximum capability, in various science and engineering fields. Establishment of an “Advanced Computational Science and Technology Center (tentative)” as one of the Center of Excellences for research, personnel development and training built around the supercomputer.
One-day Meeting, INI, September 26th, 200814 Major applications for the system Grand Challenges
One-day Meeting, INI, September 26th, 200815 Configuration of the system The Next-Generation Supercomputer will be a hybrid general-purpose supercomputer that provides the optimum computing environment for a wide range of simulations. Calculations will be performed in processing units that are suitable for the particular simulation. Parallel processing in a hybrid configuration of scalar and vector units will make larger and more complex simulations possible.
One-day Meeting, INI, September 26th, 200816 Roadmap of the project We are here.
One-day Meeting, INI, September 26th, 200817 Location of the supercomputer site, Kobe-City Tokyo Kobe 450km (280miles) west from Tokyo
One-day Meeting, INI, September 26th, 200818 Artists’ image of a building
One-day Meeting, INI, September 26th, 200819 Photo of the site (under construction) June 10, 2008 July 17, 2008 Aug. 20, 2008 Photo From South-Side
One-day Meeting, INI, September 26th, 200820 Trends of HPC system
One-day Meeting, INI, September 26th, 200821 Trends of HPC system It will have the large number of processors around 1 million or more. Each chip will be multi-core(8, 16, or 32), or many- core(more than 64) processor. low performance for each core small main memory capacity for each core fine-grain parallelism Each processor consumes low energy – low power processor Narrow bandwidth between CPU and main memory Bottleneck of the number of signal pins Bi-sectional bandwidth among compute-nodes will be narrow. One-to-one connection is very expensive and power-consuming
One-day Meeting, INI, September 26th, 200822 Impact to spectral simulations High performance in LINPACK benchmark The more the number of processors is, the higher the LINPACK performance is. It is not necessary that LINPACK performance denotes real-world application performance, especially spectral simulations Small memory capacity for each processor fine-grain decomposition of space increasing communication cost among parallel compute nodes Narrow memory bandwidth and narrow inter-node bi- sectional bandwidth memory wall problem and low all-to-all communication performance necessity of a low B/F algorithm in place of FFT
One-day Meeting, INI, September 26th, 200823 Impact to spectral simulations (cont’d) The trend does not completely fit doing 3D-FFT, i.e. box turbulence simulations are getting to be difficult to perform. We can use more and more computational resource near future, … But finer resolution simulation by spectral methods needs a long-time calculation time because of extremely slow of communications among parallel compute nodes, and we might not be able to obtain the final results in reasonable time.
One-day Meeting, INI, September 26th, 200824 Estimates for more than 4096 3 simulation If simulation performance with 500TFlops sustained can be used, 8192 3 simulation needs 7 second for one-time step 100TB total memory 8 days for 100,000 steps and 1PBytes for a complete simulation 16384 3 simulation 1 min for one-time step 800TB total memory 3 months for 125,000 steps and 10PB in total for a complete simulation
One-day Meeting, INI, September 26th, 200825 Summary Spectral methods is a very useful algorithm to evaluate the HPC system. In this sense, the trend of HPC system architecture is going to worse. Even if peak performance of the system is so high… We cannot expect high sustained performance. It may take a long time to finish a simulation due to very slow data transfer between nodes. Can we discard spectral methods and change the algorithm? Or, we have to put strong pressure on computer architecture community, and think of any international collaboration for developing the supercomputer system which fit the turbulent study. I would think of a HPC system as a particle accelerator like CERN.