Presentation on theme: "K-computer and Supercomputing Projects in Japan Makoto Taiji Computational Biology Research Core RIKEN Planning Office for the Center for Computational."— Presentation transcript:
K-computer and Supercomputing Projects in Japan Makoto Taiji Computational Biology Research Core RIKEN Planning Office for the Center for Computational and Quantitative Life Science & Processor Research Team RIKEN Advanced Institute for Computational Science
Agenda K-computer Advanced Institute for Computational Science High Performance Computing Infrastructure My own perspective in future HPC, and MDGRAPE-4 (in short)
My Backgrounds Physics Special-purpose computers for scientific simulations (1986~) –Monte Carlo simulations of spin systems (1986, m-TIS I) –FPGA-based reconfigurable machine (1990, m-TIS II) –Gravitational N-body problems (1992~96, GRAPE-4,5) –Molecular Dynamics simulations – (1994~, MD-GRAPE, MDM, MDGRAPE-3,4) –Dense Matrix Calculation, quasi-general-purpose machine –(MACE, 2000) Ultrafast laser spectroscopy (1987~92) –Conjugated Polymers –Rhodopsin and Bacteriorhodopsin Learning process as dynamical systems, multi-agent dynamics (1996~2002) Physical Random Number Generator (1997~2004)
World situation of HPC (Top 500) Country Share of Japan: Down to 6 th position
Next-Generation Supercomputer Project National project to develop a leading general- purpose supercomputer in Japan Not for single purpose – cf. Earth Simulator Location: Kobe Port Island Developer: Fujitsu Linpack 10 PetaFLOPS Partial operation: Spring 2011 Full service: Autumn 2012 K computer system (CG)
Mt. Rokko Sannomiya Port Island Kobe Sky Bridge Portliner To Akashi / Awaji-Island To Osaka About 5km from Sannomiya 12 min. by Portliner Ashiya Kobe Airport Kobe Medical Industry Development Project Core Facilities Shinkansen-Line Shin-Kobe Station Photo: June, 2006 K-computer & Advanced Institute for Computational Sciences Location of K computer
RIKEN Advanced Institute for Computational Science National Center to cover wide fields of computational science and engineering
Formation of Central Hub in Kobe 8 Strategic Region Academia Registered Organization Selection of applications User Support 【 Public Use 】 Industry Advanced Institute for Computational Science Operation Sophistication 【 Operation Organization Use 】 Interdisciplinary Research, Computer Science Operation and sophistication of the supercomputer, Computational Sciences Interdisciplinary research Director: Dr. Kimihiko Hirao Strategic Region 【 Strategic Use 】
RIKEN Advanced Institute for Computational Science 9 Director Operation Technology Division Research Promotion Division Research Division Field Theory Research Team (TL: Yoshinobu Kuramashi) Computational Biophysics Research Team (TL: Yuji Sugita) Computational Materials Science Research Team (TL: Seiji Yunoki) Computational Molecular Science Research Team (TL: Takahito Nakajima) System Software Research Team (TL: Yutaka Ishikawa) Processor Research Team (TL: Makoto Taiji) Deputy Director Computational Science Research Computer Science Research
Grand Challenge Applications Next-Generation Integrated Nano-Science Simulation Software (2006–2011) Next-Generation Integrated Life-Science Simulation Software (2006–2012) To create next-generation nano-materials (new semiconductor materials, etc.) by integrating theories (such as quantum chemistry, statistical dynamics and solid electron theory) and simulation techniques in the fields of new-generation information functions/materials, nano-biomaterials, and energy Base site: Institute for Molecular Science Next-Generation Energy Solar energy fixation Fuel alcohol Fuel cells Electric energy storage Electrons and molecules Electrons Domain Electron theory of solids Quantum chemistry Doping of fullerene and carbon nanotubes Molecular dynamics Condensed matters Integrated system 5nm Self- organized magnetic nanodots Semi- macroscopic Molecular assembly Next-Generation Nano Biomolecules Next-Generation information Function Materials One-dimensional crystal of silicon Polio virus Orbiton (orbital waves) Ferromagnetic half-metals “off”“on” light Optical switch Liposome Nafion Water 15nm Mesoscale structure of naflon membrane Self- assembly Capsulation Nafion membrane Medicines, New drug, and DDS Protein folding Nonlinear optical Device Nano quantum devices Spin electronics Ultra high-density storage devices Integrated electronic devices Water molecules inside lisozyme cavity Whole body Cardiova scular system Cells Organs Tissues Micro Macro Meso Microscopic approach MD/first principle/quantum chemistry simulations Continuous entity simulations Size Base site: RIKEN Wako Institute Electronic conduction in integrated systems Vascular system modeling Skeleton model Fluids, heat, structures Achievement of chemical reactions Molecular network analysis Protein structural analysis Drug response analysis Proteins/ DNA ~ ~ ~-6 High Intensity Focused Ultrasound Drug development Tailor-made medicine Drug Delivery System Regenerative medicine Surgical procedures Catheters Micromachines Hyperthermia Macroscopic approach Organ and body scale Toward therapeutic technology Molecular scale Cellular scale Viruses Anticancer drugs Protein control Nano processes for DDC light 27 nm 46 nm To provide new tools for breakthroughs against various problems in life science by means of petaflops-class simulation technology, leading to comprehensive understanding of biological phenomena and the development of new drugs/medical devices and diagnostic/therapeutic methods Brain Function
Appointment of Strategic Regions Computational resources and budget will be allocated for the following regions “Strategic organization” will organize the research Region 1. Foundations for predictive life sciences, medical care, and drug design Region 2. Innovation of new materials and new energies Region 3. Prediction of global change for disaster prevention and reduction Region 4. Next-generation manufacturing Region 5. Origin and structure of matter and the universe : Feasibility Studies : Strategic Researches 11
FY2008FY2009FY2010FY2011 Computer building Research building FY2007FY2006FY2012 Shared file system Processing unit Front-end unit (total system software) Next-Generation Integrated Nanoscience Simulation Next-Generation Integrated Life Simulation Verification Development, production, and evaluation Tuning and improvement Verification Production, installation, and adjustment Production, installation, and adjustment Production, installation, and adjustment Construction Design Construction Design Prototype and evaluation Detailed design Conceptual design Detailed design Basic design Basic design Development, production, and evaluation Production and evaluation System Buildings Detailed design Basic design Basic design Schedule of Project Applications Strategic Researches Research Promotion Preparatory Researches Preparatory Researches Partial operation within FY2010, Full operation starts from FY2012 Feasibility Studies 12
Features of K computer 京 = “K” means High Performance : Linpack 10 PFLOPS Massive Parallelization –> 80,000 Processors, > 640,000 Cores SPARC64 VIIIfx: Processor designed for HPC –VISIMPACT / HPC-ACE extensions 16GB / node, 2GB / core ~20MW
K-Computer System Number of nodes : > 80,000 Number of Processors: > 80,000 Number of Cores: > 640,000 Peak Performance: > 10 PFLOPS Memory Capacity: > 1PB (16GB/node) Network: Tofu interconnect (6-dim. Torus) User view: 3D-Torus Bandwidth: 5GB/s bidirectional for each six direction 4 Simultaneous Communication Bisection Bandwidth: >30TB/s (bidirectional, nominal peak) ノード CPU: 128GFLOPS (8 Core) Core SIMD(4FMA) 16GFlops Core SIMD(4FMA) 16GFlops Core SIMD(4FMA) 16GFlops Core SIMD(4FMA) 16GFlops Core SIMD(4FMA) 16GFlops Core SIMD(4FMA) 16GFlops Core SIMD(4FMA) 16GFlops L2$: 5MB 64GB/s Core SIMD(4FMA) 16GFLOPS MEM: 16GB 3D-Torus Network x y z 5GB/s x Bidirectional 5GB/s x Bidirectional 5GB/s x Bidirectional 5GB/s x Bidirectional 5GB/s x Bidirectional 5GB/s x Bidirectional
Cabinet of K computer 24 boards/cabinet 192 CPUs 24 TFLOPS 15
What is special in K computer? Network –High Bandwidth, Low Latency Processor for HPC –VISIMPACT Shared Cache & Hardware Barrier Multi-core parallelization of inner loop –HPC-ACE Register Extension SIMD 2FMA, 2 issue/cycle (4FMA/Core) Instructions for special functions (trigonometric, inverse, square-root, inverse square-root etc.) 16
17 T. Maruyama, Proc. Hot Chips 2009.
Software OS: Linux Compiler –Fujitsu compiler will support Fortran(2003), C(1999), C++(2003) GNU C/C++ extensions Automatic vectorization for SPARC64 VIIIfx OpenMP 3.0 MPI-2.1 –gcc may also be available. However, it cannot generate CPU specific instructions (e.g SIMD) and poor performance is expected.
How to use it? Five “Strategic Regions” has been selected. For these fields, MEXT will fund some research budget, and machine time will be delivered. General Use For general use, “registered organization” will control distribution of machine time. Commercial Use RIKEN does not responsible for the usage of the machine, basically.
HPCI: High Performance Computing Infrastructure System to utilize academic supercomputers in Japan 2012~ User Communities –5 strategic regions, Industrial Consortiums, National Universities and Institutes Computing Resource Provider –RIKEN AICS, University Centers, National Institutes 20
Basic Idea of HPCI 21 Logical Structure Physical Structure 25 Organization13 Organization
Problem in Future of HPC Hardware If the problem can be parallelized… Computing performance is cheap. However, in every aspects… Data movements dominates costs. –Core ー Cache –Cache ー Main Memory –Node ー Node –Node ー Disk –System ー System/Apparatus/Internet 22
Future Processors for HPC Gap between top-end HPC processors and commodity will increase What are needed for HPC –Many-core processors, Accelerators for “dense problems” –Chip stacking for bandwidth –Network integration Network will be the most important factor in HPC
Future Directions (1) Network integration is essential both for general- purpose machines and special-purpose ones Platform for Accelerators –General-purpose processor cores –Cache or local memory –Fast, low-latency on-chip and off-chip networks Network >30GB/s Memory ＞ 100GB/s Memory PU Accelerator On-chip Network >100GB/s/router
Future Directions (2) High Memory Bandwidth System –“Single-chip BlueGene/L” by System-on-Chip or Chip stacking by TSV –B/F 〜 1 –B/F 〜 0.1 for remote node Network >50GB/s Memory PU >500GB/s >500GFLOPS
Problem in Network Molecular Dynamics: Strong Scaling is important 〜 50,000 FLOP/particle/step N= GFLOP/step 5TFLOPS effective performance 1msec/step = 170nsec/day Rather Easy 5PFLOPS effective performance 1μsec/step = 200μsec/day??? Difficult, but important
Anton D. E. Shaw Research Special-purpose pipeline + General-purpose core + Dedicated Network By decreasing communication latency, it can achieve high sustained performance even for small systems R. O. Dror et al., Proc. Supercomputing 2009, in USB memory.
MDGRAPE-4 Special-purpose computer for molecular dynamics simulations Test bed for future HPC hardware FY2010-FY2012 System-on-Chip –Accelerator –Memory –General-purpose processor –Network ~4Tflops / chip