Lessons learned from the K computer project ──from K Computer to Exascale── Yoshio Oyanagi Kobe University - Historical overview of Japan and US HPC’s.

Slides:

Advertisements

Similar presentations

The Development of Mellanox - NVIDIA GPUDirect over InfiniBand A New Model for GPU to GPU Communications Gilad Shainer.

Advertisements

Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.

Status Report: JLDG ( T. Yoshie for JLDG) AGENDA 1. Current Status of JLDG 2. Reconfiguration/Extension Plan 3. Funding.

1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.

Fifty Years of Japanese HPC ─From transistor to Exascale─

Commodity Computing Clusters - next generation supercomputers? Paweł Pisarczyk, ATM S. A.

♦ Commodity processor with commodity inter- processor connection Clusters Pentium, Itanium, Opteron, Alpha GigE, Infiniband, Myrinet, Quadrics, SCI NEC.

One-day Meeting, INI, September 26th, 2008 Role of spectral turbulence simulations in developing HPC systems YOKOKAWA, Mitsuo Next-Generation Supercomputer.

Beowulf Supercomputer System Lee, Jung won CS843.

This project and the research leading to these results has received funding from the European Community's Seventh Framework Programme.

Toward Production Level Operation of Authentication System for High Performance Computing Infrastructure in Japan Eisaku Sakane and Kento Aida National.

Claude TADONKI Mines ParisTech – LAL / CNRS / INP 2 P 3 University of Oujda (Morocco) – October 7, 2011 High Performance Computing Challenges and Trends.

Introduction CS 524 – High-Performance Computing.

Some Thoughts on Technology and Strategies for Petaflops.

Supercomputers Daniel Shin CS 147, Section 1 April 29, 2010.

NPACI: National Partnership for Advanced Computational Infrastructure Supercomputing ‘98 Mannheim CRAY T90 vs. Tera MTA: The Old Champ Faces a New Challenger.

Parallel Computers Past and Present Yenchi Lin Apr 17,2003.

Parallel Computing Overview CS 524 – High-Performance Computing.

Earth Simulator Jari Halla-aho Pekka Keränen. Architecture MIMD type distributed memory 640 Nodes, 8 vector processors each. 16GB shared memory per node.

IBM RS/6000 SP POWER3 SMP Jari Jokinen Pekka Laurila.

Lecture 1: Introduction to High Performance Computing.

New Direction Proposal: An OpenFabrics Framework for high-performance I/O apps OFA TAC, Key drivers: Sean Hefty, Paul Grun.

HPC and e-Infrastructure Development in China’s High- tech R&D Program Danfeng Zhu Sino-German Joint Software Institute (JSI), Beihang University Dec.

Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.

© Fujitsu Laboratories of Europe 2009 HPC and Chaste: Towards Real-Time Simulation 24 March

この枠よりも外側には書かない．印刷時には，「書式」メニューの「背景」サブメニューを選び，「マスタ上のグラフィックスを非表示にする」をチェックすること．この枠よりも外側には書かない Pioneering the Computing & Communication Services for Academic.

CCS machine development plan for post- peta scale computing and Japanese the next generation supercomputer project Mitsuhisa Sato CCS, University of Tsukuba.

Seaborg Cerise Wuthrich CMPS Seaborg  Manufactured by IBM  Distributed Memory Parallel Supercomputer  Based on IBM’s SP RS/6000 Architecture.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

Extreme-scale computing systems – High performance computing systems Current No. 1 supercomputer Tianhe-2 at petaflops Pushing toward exa-scale computing.

Data GRID Activity in Japan Yoshiyuki WATASE KEK (High energy Accelerator Research Organization) Tsukuba, Japan

Results of the HPC in Europe Taskforce (HET) e-IRG Workshop Kimmo Koski CSC – The Finnish IT Center for Science April 19 th, 2007.

International Workshop of APEC Cooperation for Earthquake Simulation Eiichi Fukuyama National Research Institute for Earth Science and Disaster Prevention,

Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.

CRISP & SKA WP19 Status. Overview Staffing SKA Preconstruction phase Tiered Data Delivery Infrastructure Prototype deployment.

Center for Computational Sciences O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Vision for OSC Computing and Computational Sciences

Jump to first page One-gigabit Router Oskar E. Bruening and Cemal Akcaba Advisor: Prof. Agarwal.

Co-Design 2013 Summary Exascale needs new architectures due to slowing of Dennard scaling (since 2004), multi/many core limits New programming models,

1 ILDG Status in Japan  Lattice QCD Archive(LQA) a gateway to ILDG Japan Grid  HEPNet-J/sc an infrastructure for Japan Lattice QCD Grid A. Ukawa Center.

Brent Gorda LBNL – SOS7 3/5/03 1 Planned Machines: BluePlanet SOS7 March 5, 2003 Brent Gorda Future Technologies Group Lawrence Berkeley.

Computing Environment The computing environment rapidly evolving ‑ you need to know not only the methods, but also How and when to apply them, Which computers.

Status Report on ILC Project in Japan Seiichi SHIMASAKI Director, Office for Particle and Nuclear Research Promotion June 19, 2015.

Cray Environmental Industry Solutions Per Nyberg Earth Sciences Business Manager Annecy CAS2K3 Sept 2003.

B5: Exascale Hardware. Capability Requirements Several different requirements –Exaflops/Exascale single application –Ensembles of Petaflop apps requiring.

Status and plans at KEK Shoji Hashimoto Workshop on LQCD Software for Blue Gene/L, Boston University, Jan. 27, 2006.

How fast are Supercomputers ? Covering: FLOPS, How fast?, TOP500

INFSO-RI Enabling Grids for E-sciencE The EGEE Project Owen Appleton EGEE Dissemination Officer CERN, Switzerland Danish Grid Forum.

Professor Arthur Trew Director, EPCC EPCC: KT in novel computing.

Parallel Computers Today Oak Ridge / Cray Jaguar > 1.75 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating.

ORNL is managed by UT-Battelle for the US Department of Energy Musings about SOS Buddy Bland Presented to: SOS20 Conference March 25, 2016 Asheville, NC.

Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.

BLUE GENE Sunitha M. Jenarius. What is Blue Gene A massively parallel supercomputer using tens of thousands of embedded PowerPC processors supporting.

HPC-related R&D in 863 Program Depei Qian Sino-German Joint Software Institute (JSI) Beihang University Aug. 27, 2010.

E-Infrastructure and HPC development in China Rui Wang ， Depei Qian Sino-German Joint Software Institute (JSI) Beihang University Sept. 28, 2012.

HPC & e-Infrastructures in China

Lectures on Parallel and Distributed Computing

Toward World-Class Education and Research

Overview of Earth Simulator.

ICT NCP Infoday Brussels, 23 June 2010

Modern Processor Design: Superscalar and Superpipelining

What is Parallel and Distributed computing?

Overview of WMO Strategic Planning Initiative

Course Description: Parallel Computer Architecture

with Computational Scientists

Constructing a system with multiple computers or processors

Constructing a system with multiple computers or processors

The C&C Center Three Major Missions: In This Presentation:

K computer RIKEN Advanced Institute for Computational Science

K computer RIKEN Advanced Institute for Computational Science

Benchmark software for HPC systems

Presentation transcript:

Lessons learned from the K computer project ──from K Computer to Exascale── Yoshio Oyanagi Kobe University - Historical overview of Japan and US HPC’s - What is the difference between Japan and US trends? - How can we go beyond Petaflops? ACAT /6/14

ACAT 20132

3

4 百亿亿 bǎi yì yì

I II III IV V Cray-1 205, XMP ETA10,YMP C90 T90 X1 CM-2 CM-5,Paragon SP-1/2, T3D T3E, Origin, Regatta APU, IAP S810 S820 S3800 SR2201/8000/11000 VP200 VP2600 VPP500 VPP300/700/5000 PP SX-1/2 SX-3 SX4 SX-5 SX-6/7/8 MITI Supercomputer Project RWCP Fifth Generation Project Illiac IV PAX-32 QCDPAX cp-pacs pacs-cs NWT ES ACAT /6/14

Historical Overview of Supercomputers 1970s: Primordial age 1980s: Vector age, parallel started 1990s: Commodity parallel in USA, Japan slowly moved to parallel 2000s: Commodity parallel in mainstream. NEC active in vector. 2010s ： Petaflops age 2020s: ?? ACAT /6/14

1970’s (red for vector machines) USA Vendors: ASC(72), STAR-100(73), ILLIAC- IV(73), Cray-1(76), HEP (79) –Y. Muraoka, K. Miura and others learned at ILLIAC IV. UK: ICL DAP (79) Japan. Vendors: FACOM 230/75 APU(77), HITAC M180 IAP(78) Kyoto U (Electric Eng.): QA-1(74), QA-2 (VLIW) –Signal processing, Image processing Kyoto U (Nuclear Eng.): PACS-9(78) (  U. Tsukuba) –Reactor simulation ACAT /6/14

1980’s (Vectors) USA Vendors: –Cyber-205 (81), XMP-4 (84), Cray-2 (85), IBM 3090 VF (85), ETA-10 (87), YMP (88) –Convex C1 (85), SCS-40 (86), Convex C2 (88), Supertek S1 (89) Japanese Vendors: –Hitac S810/20 (83), S820 (87) –FACOM VP200 (83), VP2600 (89) –NEC SX-2 (85), SX-3 (90) ACAT /6/14

1980’s (US Parallel) Parallel Ventures in US: BBN Butterfly (81), Cosmic Cube (83), Elxsi 6400 (83), Pyramid 90x (83), Balance 8000 (84), nCUBE/1 (85), Alliant FX/8 (85), Encore Multimax (86), FPS T-series (86), Meiko CS-1 (86), Thinking Machines CM-1 (86), CM-2 (87), Multiflow Trace/200 (87) ACAT /6/14

1980’s (Japan Parallel) Japanese Activities (mainly for research): –U. Tsukuba: Pax-32 (80), Pax-128 (83), Pax- 32J (84), qcdpax (89) for qcd –Fifth Generation (ICOT) of MITI PIM machines for inference –Supercomputer Project of MITI PHI, Sigma-1 (dataflow), CAP, VPP (GaAs) –Osaka U.: EVLIS (82) for LISP –Keio U.: SM 2 (83) for sparse matrix –U. Tokyo: Grape-1 (89) ACAT /6/14

1990’s (USA) USA Vectors: C90 (91), Cray-3 (93), T90 (95), SV1 (98) USA Parallel (use commodity processors): –CM-5 (92), KSR-2 (93, special chip), SPP (94) –SP1 (93), SP2 (94), ASCI Blue Pacific (97), Power 3 SP (99) –T3D (93), T3E (96) –ASCI Red (97) –Origin 2000 (96), ASCI Blue Mountain (98) ACAT /6/14

1990’s (Japan) Japan. Vectors: S3800 (93), NWT (93), VPP500 (93), SX-4 (95), VPP300 (95), VPP5000 (99) Japan. Parallel: –cp-pacs (96), SR2201 (96), SR8000(98) –AP1000 (94), AP3000 (97) –Cenju-2 (93), Cenju-3 (94), Cenju-4(97) –Some are sold as a testbed. RWCP project (MITI, 92-02): Cluster connected by Myrinet. Score middleware. ACAT /6/14

NWT cp-pacs ES SR2201 ACAT K(10.51) 2016/6/14

Japanese Supercomputers in Top20 ACAT /6/14

Japanese Supercomputers in Top20 ACAT /6/14

Observation of Japan (1/3) Until late 1990’s, Japanese vendors focused on vector machines. Users exploited the power of vectorization. Vendors thought parallel machines were for specialized purposes (eg. image processing). Most users dared not try to harness parallel machines in the 80’s. Some computer scientists were interested in building parallel machines, but they were not used for practical scientific computing. ACAT /6/14

Observations of Japan (2/3) Practical parallel processing for scientific computing was started by application users: qcd-pax, NWT, cp-pacs, GRAPE’s, ES. Softwares –Very good vectorizing compilers. –Users were spoiled by them. –Users found difficulties in using message passing. –HPF efforts for the Earth Simulator. –OpenMP –Score middleware from RWCP ACAT /6/14

Observation of Japan (3/3) Japan was at least ten years late in parallel processing for scientific computing as compared to US around Industry reluctant in parallel computing. Education in parallel processing is a urgent issue for the K computer 「京計算机」 (10PF machine). More collaboration of computer scientists and application scientists is needed. ACAT /6/14

HPC in China 1983: 銀河 (Galaxy) /3: 863 Plan ( 鄧小平, Deng Xiaoping) Lenovo: DeepComp series Dawning （曙光 ): Dawning series –Nebulae PF (2 nd 2010/6) NUDT – 銀河 2 (1992) – 銀河 3 (1997) – 天河 (Tianhe) TF (5 th 2009) – 天河 -1A PF (1 st 2010) 2016/6/14ACAT201319

神威藍光 (Sunway BlueLight) （国家済南超級計算中心） 2016/6/14ACAT2013 申威 1600 (SW1600) chip マシン本体 (1.07 PF) 20

2016/6/14ACAT201321

2016/6/14ACAT201322

京 (K) Supercomputer of Japan Started in April 2006, open to users in September 2012 Over 10 PF with LINPACK Site: Kobe (Port Island) Run by Riken (AICS) Architecture –Octacore scalar processor (Sparc64 viiifx) –Tofu interconnect (6-dim torus) ACAT /6/14

ACAT /6/14

History of the K Computer Preparation (2004 － 2007) Proposed architectures and killer applications Hybrid machine of vector and scalar Five strategic application fields defined Withdrawal of vector machine in 2009 the Government Revitalization Unit proposed to shutdown the Next Generation Supercomputer Project (Nov. 13, 2009) HPCI started (High Performance Computing Infrastructure) ACAT /6/14

Preparations in Japan IT Strategic Headquarter (2001): e-Japan, but only network was emphasized. –At this stage, level up of supercomputers was to be promoted according to the needs of each field (not a national project). Earth Simulator attained 36 Tflops (2002) Information Science and Technology committee in Mext has been discussing the measures to promote computational science and technology since August ACAT /6/14

Preparations in Japan Recommendation to a Mext committee (2005): –Promote a national project to construct a leading edge supercomputer –Government decision (July 25, 2005) Riken started the project (October 2005) Mext funded four projects to promote “Element Technologies for Future Supercomputers” in $40M per year (in total) –Four groups were accepted 1.System Interconnect (Kyushu U and Fujitsu) 2.Interconnect by IP (U of Tokyo, Keio U etc) 3.Low Power Device and Circuits (Hitachi, U of Tokyo, U of Tsukuba) 4.Optical Connection of CPU and Memory (NEC and Titech) ACAT /6/14

Killer Applications The WG in Mext identified killer applications –Life Science –Astrophysics –Space and Aeronautics –Materials –Atomic Energy –Environment –Disaster Prevention –Fluid Dynamics –Plasma (space/fusion) –Industrial Design ACAT /6/14

ACAT /6/14

Observations of the WG Multiphysics-multiscale simulation will be important in those fields. To keep high performance in different types of computing, hybrid architecture is appropriate. –We estimated required performances. The interconnection between different architectures should have high speed. ACAT /6/14

Original Proposal Large scale processing Scalar computer Special-purpose computer ACAT /6/14

Architecture Conceptual design of hardware (Sept. 14, 2007) The system consists of two parts: –Scalar processor part by Fujitsu –Vector processor part by NEC and Hitachi Site selection: Kobe (March 28, 2007) NEC retired from the project (May 13, 2009) Riken decided to continue the joint development with Fujitsu to build 10 Pflops machine (May 14, 2009). ACAT /6/14

SPIRE (Strategic Programs for Innovative REserch) Mext identified five strategic fields (July 22, 2009): 1.Predictive bioscience, medicare and drug design 2.New materials and new energy 3.Earth environmental prediction for disaster prevention and mitigation 4.Next-generation manufacturing 5.Origin and structure of material and universe ACAT /6/14

ACAT2013 Friday, November 13, 2009 The 3 rd WG of the Government Revitalization Unit “Why should it be No.1 in the world?” “Is No.2 not enough?” Vote ： Abolish １ Postpone ６ Budget shrink ５ Conclusion Freeze the project! ← 村田 ( 謝 ) 蓮舫 (Hsieh Lien Fang) /6/14

Revival Strong reactions from academia and industry. Government decided to overturn the conclusion (Dec. 2009) Leading organizations for five strategic fields announced (Jan. 2010) HPCI started (March 2010) –High Performance Computing Infrastructure ACAT /6/14

Leading Organizations for SPIRE 1.Predictive bioscience, medicare and drug design: Riken 2.New materials and new energy: Institute for Solid State Physics, Univ. of Tokyo 3.Earth environmental prediction for disaster prevention and mitigation: JAMSTEC 4.Next-generation manufacturing: Institute for Industrial Sciences, Univ. of Tokyo 5.Origin and structure of material and universe: Center for Computational Sciences, Univ. of Tsukuba ACAT /6/14

Nickname 京 Proposals were solicitated from public. Final decision (July 5, 2010): 「京 (Kei) 」 (The K Computer) 「京」＝＝「億億 ( 亿亿 ) 」 (in Chinese) Capital or big city （北京、南京、東京、京都） Originally means “big gate” Fujitsu disclosed SPARC64 viiifx chip (July 9, 2010) The first eight racks were shipped to Riken, Kobe (Sept. 28, 2010) ACAT /6/14

ACAT2013 AICS, Riken 航空写真研究棟（南側）理研 web より /6/14

ACAT /1/13 撮影 Railway Station Kobe U. (I’m here) AICS, Riken Hyogo PU. FOCUS /6/14

No.1 in the world Pflops attained using 80% of the system ー No.1 in the Top500 (ISC2011) June 20, Pflops in SC2011 (Seattle) Now tuning the system using strategic test programs in various fields. Open to users: September 2012 RIST is to manage K Computer users (Research Organization for Information Science & Technology) ACAT /6/14

What’s next? Toward EXA FLOPS!! Preliminary consideration among scientists Mext started a WG for future HPC (April 2011) –Hardware-System Software-Application co-design is important. –Should be science-driven. We identified possible break throughts in science and technology. Social needs are also considered. Linpack Exaflops is not our target. –Limitation by budget, power, foot print …. –Several different architectures are considered. ACAT /6/14

Architecture Algorithm Application ACAT /6/14

Two subgroups worked Application Subgroup –Application –Numerical library –Algorithm –Automatic tuning System Subgroup –CPU and architecture –Compiler –System software ACAT /6/14 Final Report in March Executive summary "Report on the development of future HPCI technology“ (28 slides) --Roadmap of computational sciences (158 pages) --Roadmap of HPCI technology (108 pages)

Memory and memory bandwidth Big technical issue ACAT Large B/FlopsSmall B/Flops Large B/Flop Small B/Flop 0.1 standard SoC K GPU Small system ES 2016/6/14

Memory and memory B/W The K Computer (single node) –64 GB/s for 128 Gflops B/Flop –16 GB for 128 Gflops B/Flops Standard EXA –0.1 EB/s for 1 Eflops B/Flops – PB for 1 Eflops B/Flops Limitation –Cost and power –Programmability ACAT /6/14

Efforts in Japan Feasibility studies of future HPCI systems (2012-3) : –One application team –Three system design teams Working group to consider future HPCI policy (chair: oyanagi): –National and international computer technology –User needs to HPCI –Possible scientific and social outcome of HPCI –Necessary computing resources in the future –Possible HPCI systems –Necessary cost and benefits ACAT /6/14

Efforts in Japan Working group –Interim Report (May 2013) –Public Review –Budget Request for 2014fy (August 2013) –Final Report (March 2014) Recommendation of the WG –We should have a computer 100x faster than K in practical applications –We have to build one with our own technology –We should not aim at winning No.1 in Top500 ACAT

Conclusion (1/2) In Japan, due to the success of vector computers in the 1980’s, parallel processing was behind US and Europe. Practical parallel computers (NWT, cp-pacs, ES) were built in collaboration with application users. Strong head wind to the K Computer By the success of the K Computer, we caught up in parallel processing. The K is very stable and used extensively. ACAT

Conclusion (2/2) We hope to build exascale supercomputers around 2020 Different applications require different architecture in terms of B/Flop and B/Flops They should be science-driven, not Linpack- driven Support of taxpayers is important Very hard to program on Exascale due to memory hierarchy Applications strongly demand such machines ACAT

ACAT /6/14