One-day Meeting, INI, September 26th, 2008 Role of spectral turbulence simulations in developing HPC systems YOKOKAWA, Mitsuo Next-Generation Supercomputer.

Slides:



Advertisements
Similar presentations
Parallel Programming and Algorithms : A Primer Kishore Kothapalli IIIT-H Workshop on Multi-core Technologies International Institute.
Advertisements

Supercomputing Challenges at the National Center for Atmospheric Research Dr. Richard Loft Computational Science Section Scientific Computing Division.
2. Computer Clusters for Scalable Parallel Computing
Parallel Research at Illinois Parallel Everywhere
GPU System Architecture Alan Gray EPCC The University of Edinburgh.
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
Petascale System Requirements for the Geosciences Richard Loft SCD Deputy Director for R&D.
Lattice QCD (INTRODUCTION) DUBNA WINTER SCHOOL 1-2 FEBRUARY 2005.
Information Technology Center Introduction to High Performance Computing at KFUPM.
Claude TADONKI Mines ParisTech – LAL / CNRS / INP 2 P 3 University of Oujda (Morocco) – October 7, 2011 High Performance Computing Challenges and Trends.
- Sam Ganzfried - Ryan Sukauye - Aniket Ponkshe. Outline Effects of asymmetry and how to handle them Design Space Exploration for Core Architecture Accelerating.
July Terry Jones, Integrated Computing & Communications Dept Fast-OS.
1 Computer Science, University of Warwick Metrics  FLOPS (FLoating point Operations Per Sec) - a measure of the numerical processing of a CPU which can.
CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.
PARALLEL PROCESSING The NAS Parallel Benchmarks Daniel Gross Chen Haiout.
Lecture 1: Introduction to High Performance Computing.
Computing Platform Benchmark By Boonyarit Changaival King Mongkut’s University of Technology Thonburi (KMUTT)
1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,
Programming for High Performance Computers John M. Levesque Director Cray’s Supercomputing Center Of Excellence.
1 Challenges Facing Modeling and Simulation in HPC Environments Panel remarks ECMS Multiconference HPCS 2008 Nicosia Cyprus June Geoffrey Fox Community.
© Fujitsu Laboratories of Europe 2009 HPC and Chaste: Towards Real-Time Simulation 24 March
July 11, 2001Daniel Whiteson Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley.
Statistical Performance Analysis for Scientific Applications Presentation at the XSEDE14 Conference Atlanta, GA Fei Xing Haihang You Charng-Da Lu July.
“The Architecture of Massively Parallel Processor CP-PACS” Taisuke Boku, Hiroshi Nakamura, et al. University of Tsukuba, Japan by Emre Tapcı.
Willem A. Landman Asmerom Beraki Francois Engelbrecht Stephanie Landman Supercomputing for weather and climate modelling: convenience or necessity.
Evaluating Sparse Linear System Solvers on Scalable Parallel Architectures Ahmed Sameh and Ananth Grama Computer Science Department Purdue University.
CCS machine development plan for post- peta scale computing and Japanese the next generation supercomputer project Mitsuhisa Sato CCS, University of Tsukuba.
Future role of DMR in Cyber Infrastructure D. Ceperley NCSA, University of Illinois Urbana-Champaign N.B. All views expressed are my own.
Extreme-scale computing systems – High performance computing systems Current No. 1 supercomputer Tianhe-2 at petaflops Pushing toward exa-scale computing.
R. Ryne, NUG mtg: Page 1 High Energy Physics Greenbook Presentation Robert D. Ryne Lawrence Berkeley National Laboratory NERSC User Group Meeting.
Advanced Computer Architecture, CSE 520 Generating FPGA-Accelerated DFT Libraries Chi-Li Yu Nov. 13, 2007.
Results of the HPC in Europe Taskforce (HET) e-IRG Workshop Kimmo Koski CSC – The Finnish IT Center for Science April 19 th, 2007.
SJSU SPRING 2011 PARALLEL COMPUTING Parallel Computing CS 147: Computer Architecture Instructor: Professor Sin-Min Lee Spring 2011 By: Alice Cotti.
Problem is to compute: f(latitude, longitude, elevation, time)  temperature, pressure, humidity, wind velocity Approach: –Discretize the.
Massive Supercomputing Coping with Heterogeneity of Modern Accelerators Toshio Endo and Satoshi Matsuoka Tokyo Institute of Technology, Japan.
Co-Design 2013 Summary Exascale needs new architectures due to slowing of Dennard scaling (since 2004), multi/many core limits New programming models,
Overcoming Scaling Challenges in Bio-molecular Simulations Abhinav Bhatelé Sameer Kumar Chao Mei James C. Phillips Gengbin Zheng Laxmikant V. Kalé.
2009/4/21 Third French-Japanese PAAP Workshop 1 A Volumetric 3-D FFT on Clusters of Multi-Core Processors Daisuke Takahashi University of Tsukuba, Japan.
2007/11/2 First French-Japanese PAAP Workshop 1 The FFTE Library and the HPC Challenge (HPCC) Benchmark Suite Daisuke Takahashi Center for Computational.
IDC HPC USER FORUM Weather & Climate PANEL September 2009 Broomfield, CO Panel questions: 1 response per question Limit length to 1 slide.
High performance parallel computing of climate models towards the Earth Simulator --- computing science activities at CRIEPI --- Yoshikatsu Yoshida and.
Diskless Checkpointing on Super-scale Architectures Applied to the Fast Fourier Transform Christian Engelmann, Al Geist Oak Ridge National Laboratory Februrary,
© 2009 IBM Corporation Motivation for HPC Innovation in the Coming Decade Dave Turek VP Deep Computing, IBM.
CLIM Fall 2008 What are the Roles of Satellites & Supercomputers in Studying Weather and Climate? CLIM 101.
ESFRI & e-Infrastructure Collaborations, EGEE’09 Krzysztof Wrona September 21 st, 2009 European XFEL.
Computing Environment The computing environment rapidly evolving ‑ you need to know not only the methods, but also How and when to apply them, Which computers.
Authors – Jeahyuk huh, Doug Burger, and Stephen W.Keckler Presenter – Sushma Myneni Exploring the Design Space of Future CMPs.
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
1 THE EARTH SIMULATOR SYSTEM By: Shinichi HABATA, Mitsuo YOKOKAWA, Shigemune KITAWAKI Presented by: Anisha Thonour.
October 2008 Integrated Predictive Simulation System for Earthquake and Tsunami Disaster CREST/Japan Science and Technology Agency (JST)
Program Optimizations and Recent Trends in Heterogeneous Parallel Computing Dušan Gajić, University of Niš Program Optimizations and Recent Trends in Heterogeneous.
Data Structures and Algorithms in Parallel Computing Lecture 7.
Status and plans at KEK Shoji Hashimoto Workshop on LQCD Software for Blue Gene/L, Boston University, Jan. 27, 2006.
An Integrated GPU Power and Performance Model (ISCA’10, June 19–23, 2010, Saint-Malo, France. International Symposium on Computer Architecture)
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Tackling I/O Issues 1 David Race 16 March 2010.
PEER 2003 Meeting 03/08/031 Interdisciplinary Framework Major focus areas Structural Representation Fault Systems Earthquake Source Physics Ground Motions.
Petascale Computing Resource Allocations PRAC – NSF Ed Walker, NSF CISE/ACI March 3,
Computational Chemistry Trygve Helgaker CTCC, Department of Chemistry, University of Oslo.
Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.
TEMPLATE DESIGN © H. Che 2, E. D’Azevedo 1, M. Sekachev 3, K. Wong 3 1 Oak Ridge National Laboratory, 2 Chinese University.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
The Science Data Processor and Regional Centre Overview Paul Alexander UK Science Director the SKA Organisation Leader the Science Data Processor Consortium.
SCEC Capability Simulations on TeraGrid
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
The Earth Simulator System
ICT NCP Infoday Brussels, 23 June 2010
with Computational Scientists
The C&C Center Three Major Missions: In This Presentation:
Low Order Methods for Simulation of Turbulence in Complex Geometries
Presentation transcript:

One-day Meeting, INI, September 26th, 2008 Role of spectral turbulence simulations in developing HPC systems YOKOKAWA, Mitsuo Next-Generation Supercomputer R&D Center RIKEN

One-day Meeting, INI, September 26th, Background Experience of developing the Earth Simulator 40Tflops vector-type distributed-memory supercomputer system A simulation code for box turbulence flow was used in the final adjustment of the system Large simulation on box turbulence flow was carried out. A Peta-flops supercomputer project

One-day Meeting, INI, September 26th, Contents Simulations on the Earth Simulator A Japanese peta-scale supercomputer project Trends of HPC system Summary

One-day Meeting, INI, September 26th, Simulations on the Earth Simulator

One-day Meeting, INI, September 26th, The Earth Simulator It was completed in Tflops sustained in LINPACK benchmark was achieved. It was chosen as one of 2002 best inventions by “TIME.”

One-day Meeting, INI, September 26th, Why I did? It is important to make performance evaluation of the Earth Simulator at the final adjustment phase. Suitable codes should be chosen To evaluate performance of vector processor, To measure performance all-to-all communication among compute-nodes through a crossbar switch, To make an operation of the Earth Simulator stable. Candidates LINPACK Benchmark? Atmospheric general circulation model (AGCM)? Any other code?

One-day Meeting, INI, September 26th, Why I did? (cont’d) Spectral turbulence simulation code Intensive computational kernel & a lot of data communications Simple code Significance to computational science. One of the grand challenges in computational science and high performance computing A new spectral code for the Earth Simulator Fourier spectral method for spatial discretization Some techniques (mode truncation and phase shift techniques) for aliasing error in calculating nonlinear terms Fourth-order Runge-Kutta method for time integration

One-day Meeting, INI, September 26th, Points of coding Optimization to the Earth Simulator Coordinated assignment of calculation to three-level of parallelism (vector processing, micro-tasking, and MPI parallelization) Higher-radix FFT B/F (data transfer rate between CPU and memories vs. operation performance) Removal of redundant processes and variables

One-day Meeting, INI, September 26th, sec Calculation for one time step Number of nodes Wall time 30.7sec days by 512 PNs

One-day Meeting, INI, September 26th, Performance Tflops 16.4Tflops Number of PNs % of the peak (single precision & analytical FLOP number)

One-day Meeting, INI, September 26th, Achievement of box turbulence flow simulations Year Orszag(1969) IBM Kerr(1985) Cray-1S NCAR K & I & Y (2002) Earth Simulator , Number of grid points Yamamoto(1994) Numerical Wind Tunnel Jimenez et al.(1993) Caltech Delta machine Siggia(1981) Cray-1 NCAR Gotoh&Fukayama(2001) VPP5000/56 NUCC 240 3

One-day Meeting, INI, September 26th, A Japanese Peta-Scale Supercomputer Project

One-day Meeting, INI, September 26th, Next-Generation Supercomputer Project Objectives are to develop the world's most advanced and high-performance supercomputer to develop and deploy its usage technologies as well as application software. as one of Japan's Key Technologies of National Importance. Period & Budget: FY2006-FY2012, ~1 billion US$ (expected) RIKEN (The Institute of Physical and Chemical Research) plays the central role of the project in developing the supercomputer under the law.

One-day Meeting, INI, September 26th, Goals of the project Development and installation of the most advanced high performance supercomputer system with LINPACK performance of 10 petaflops. Development and deployment of application software, which should be made to attain the system maximum capability, in various science and engineering fields. Establishment of an “Advanced Computational Science and Technology Center (tentative)” as one of the Center of Excellences for research, personnel development and training built around the supercomputer.

One-day Meeting, INI, September 26th, Major applications for the system Grand Challenges

One-day Meeting, INI, September 26th, Configuration of the system The Next-Generation Supercomputer will be a hybrid general-purpose supercomputer that provides the optimum computing environment for a wide range of simulations. Calculations will be performed in processing units that are suitable for the particular simulation. Parallel processing in a hybrid configuration of scalar and vector units will make larger and more complex simulations possible.

One-day Meeting, INI, September 26th, Roadmap of the project We are here.

One-day Meeting, INI, September 26th, Location of the supercomputer site, Kobe-City Tokyo Kobe 450km (280miles) west from Tokyo

One-day Meeting, INI, September 26th, Artists’ image of a building

One-day Meeting, INI, September 26th, Photo of the site (under construction) June 10, 2008 July 17, 2008 Aug. 20, 2008 Photo From South-Side

One-day Meeting, INI, September 26th, Trends of HPC system

One-day Meeting, INI, September 26th, Trends of HPC system It will have the large number of processors around 1 million or more. Each chip will be multi-core(8, 16, or 32), or many- core(more than 64) processor. low performance for each core small main memory capacity for each core fine-grain parallelism Each processor consumes low energy – low power processor Narrow bandwidth between CPU and main memory Bottleneck of the number of signal pins Bi-sectional bandwidth among compute-nodes will be narrow. One-to-one connection is very expensive and power-consuming

One-day Meeting, INI, September 26th, Impact to spectral simulations High performance in LINPACK benchmark The more the number of processors is, the higher the LINPACK performance is. It is not necessary that LINPACK performance denotes real-world application performance, especially spectral simulations Small memory capacity for each processor fine-grain decomposition of space increasing communication cost among parallel compute nodes Narrow memory bandwidth and narrow inter-node bi- sectional bandwidth memory wall problem and low all-to-all communication performance necessity of a low B/F algorithm in place of FFT

One-day Meeting, INI, September 26th, Impact to spectral simulations (cont’d) The trend does not completely fit doing 3D-FFT, i.e. box turbulence simulations are getting to be difficult to perform. We can use more and more computational resource near future, … But finer resolution simulation by spectral methods needs a long-time calculation time because of extremely slow of communications among parallel compute nodes, and we might not be able to obtain the final results in reasonable time.

One-day Meeting, INI, September 26th, Estimates for more than simulation If simulation performance with 500TFlops sustained can be used, simulation needs 7 second for one-time step 100TB total memory 8 days for 100,000 steps and 1PBytes for a complete simulation simulation 1 min for one-time step 800TB total memory 3 months for 125,000 steps and 10PB in total for a complete simulation

One-day Meeting, INI, September 26th, Summary Spectral methods is a very useful algorithm to evaluate the HPC system. In this sense, the trend of HPC system architecture is going to worse. Even if peak performance of the system is so high… We cannot expect high sustained performance. It may take a long time to finish a simulation due to very slow data transfer between nodes. Can we discard spectral methods and change the algorithm? Or, we have to put strong pressure on computer architecture community, and think of any international collaboration for developing the supercomputer system which fit the turbulent study. I would think of a HPC system as a particle accelerator like CERN.