Updates of AICS and the next step for Post-Petascale Computing in Japan Mitsuhisa Sato University of Tsukuba Team leader of programming environment research.

Slides:



Advertisements
Similar presentations
Issues of HPC software From the experience of TH-1A Lu Yutong NUDT.
Advertisements

O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Center for Computational Sciences Cray X1 and Black Widow at ORNL Center for Computational.
High-Performance Computing
Towards a Virtual European Supercomputing Infrastructure Vision & issues Sanzio Bassini
Appro Xtreme-X Supercomputers A P P R O I N T E R N A T I O N A L I N C.
Supercomputing Challenges at the National Center for Atmospheric Research Dr. Richard Loft Computational Science Section Scientific Computing Division.
♦ Commodity processor with commodity inter- processor connection Clusters Pentium, Itanium, Opteron, Alpha GigE, Infiniband, Myrinet, Quadrics, SCI NEC.
One-day Meeting, INI, September 26th, 2008 Role of spectral turbulence simulations in developing HPC systems YOKOKAWA, Mitsuo Next-Generation Supercomputer.
Two Types of Supercomputer developments Yutaka Ishikawa RIKEN AICS University of Tokyo /09/03 Session.
K-computer and Supercomputing Projects in Japan Makoto Taiji Computational Biology Research Core RIKEN Planning Office for the Center for Computational.
Toward Production Level Operation of Authentication System for High Performance Computing Infrastructure in Japan Eisaku Sakane and Kento Aida National.
GPU System Architecture Alan Gray EPCC The University of Edinburgh.
GENI: Global Environment for Networking Innovations Larry Landweber Senior Advisor NSF:CISE Joint Techs Madison, WI July 17, 2006.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA; SAN DIEGO SDSC RP Update October 21, 2010.
IDC HPC User Forum Conference Appro Product Update Anthony Kenisky, VP of Sales.
LinkSCEEM-2: A computational resource for the development of Computational Sciences in the Eastern Mediterranean Mostafa Zoubi SESAME SESAME – LinkSCEEM.
Claude TADONKI Mines ParisTech – LAL / CNRS / INP 2 P 3 University of Oujda (Morocco) – October 7, 2011 High Performance Computing Challenges and Trends.
1 Ideas About the Future of HPC in Europe “The views expressed in this presentation are those of the author and do not necessarily reflect the views of.
Real Parallel Computers. Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra,
Chapter 2 Computer Clusters Lecture 2.3 GPU Clusters for Massive Paralelism.
CCS machine development plan for post- peta scale computing and Japanese the next generation supercomputer project Mitsuhisa Sato CCS, University of Tsukuba.
Extreme-scale computing systems – High performance computing systems Current No. 1 supercomputer Tianhe-2 at petaflops Pushing toward exa-scale computing.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Molecular Science in NPACI Russ B. Altman NPACI Molecular Science Thrust Stanford Medical.
Data GRID Activity in Japan Yoshiyuki WATASE KEK (High energy Accelerator Research Organization) Tsukuba, Japan
Rensselaer Why not change the world? Rensselaer Why not change the world? 1.
Results of the HPC in Europe Taskforce (HET) e-IRG Workshop Kimmo Koski CSC – The Finnish IT Center for Science April 19 th, 2007.
March 9, 2015 San Jose Compute Engineering Workshop.
Presented by Leadership Computing Facility (LCF) Roadmap Buddy Bland Center for Computational Sciences Leadership Computing Facility Project.
Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,
- Rohan Dhamnaskar. Overview  What is a Supercomputer  Some Concepts  Couple of examples.
Pascucci-1 Valerio Pascucci Director, CEDMAV Professor, SCI Institute & School of Computing Laboratory Fellow, PNNL Massive Data Management, Analysis,
1 CMPE 511 HIGH PERFORMANCE COMPUTING CLUSTERS Dilek Demirel İşçi.
Protein Explorer: A Petaflops Special Purpose Computer System for Molecular Dynamics Simulations David Gobaud Computational Drug Discovery Stanford University.
A Framework for Visualizing Science at the Petascale and Beyond Kelly Gaither Research Scientist Associate Director, Data and Information Analysis Texas.
Towards Exascale File I/O Yutaka Ishikawa University of Tokyo, Japan 2009/05/21.
Brent Gorda LBNL – SOS7 3/5/03 1 Planned Machines: BluePlanet SOS7 March 5, 2003 Brent Gorda Future Technologies Group Lawrence Berkeley.
Introduction to Research 2011 Introduction to Research 2011 Ashok Srinivasan Florida State University Images from ORNL, IBM, NVIDIA.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
A Data Centre for Science and Industry Roadmap. INNOVATION NETWORKING DATA PROCESSING DATA REPOSITORY.
Status Report on ILC Project in Japan Seiichi SHIMASAKI Director, Office for Particle and Nuclear Research Promotion June 19, 2015.
Cray Environmental Industry Solutions Per Nyberg Earth Sciences Business Manager Annecy CAS2K3 Sept 2003.
11 January 2005 High Performance Computing at NCAR Tom Bettge Deputy Director Scientific Computing Division National Center for Atmospheric Research Boulder,
Modeling Billion-Node Torus Networks Using Massively Parallel Discrete-Event Simulation Ning Liu, Christopher Carothers 1.
© 2010 Pittsburgh Supercomputing Center Pittsburgh Supercomputing Center RP Update July 1, 2010 Bob Stock Associate Director
B5: Exascale Hardware. Capability Requirements Several different requirements –Exaflops/Exascale single application –Ensembles of Petaflop apps requiring.
Status and plans at KEK Shoji Hashimoto Workshop on LQCD Software for Blue Gene/L, Boston University, Jan. 27, 2006.
National Center for Supercomputing Applications University of Illinois at Urbana–Champaign Visualization Support for XSEDE and Blue Waters DOE Graphics.
Exscale – when will it happen? William Kramer National Center for Supercomputing Applications.
NICS Update Bruce Loftis 16 December National Institute for Computational Sciences University of Tennessee and ORNL partnership  NICS is the 2.
Presented by NCCS Hardware Jim Rogers Director of Operations National Center for Computational Sciences.
5G PPP Stakeholders event From Phase 1 to Phase 2
December 13, G raphical A symmetric P rocessing Prototype Presentation December 13, 2004.
Pathway to Petaflops A vendor contribution Philippe Trautmann Business Development Manager HPC & Grid Global Education, Government & Healthcare.
Petascale Computing Resource Allocations PRAC – NSF Ed Walker, NSF CISE/ACI March 3,
Building PetaScale Applications and Tools on the TeraGrid Workshop December 11-12, 2007 Scott Lathrop and Sergiu Sanielevici.
Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.
BLUE GENE Sunitha M. Jenarius. What is Blue Gene A massively parallel supercomputer using tens of thousands of embedded PowerPC processors supporting.
Scientific Computing at Fermilab Lothar Bauerdick, Deputy Head Scientific Computing Division 1 of 7 10k slot tape robots.
Engineering (Richard D. Braatz and Umberto Ravaioli)
Electron Ion Collider New aspects of EIC experiment instrumentation and computing, as well as their possible impact on and context in society (B) COMPUTING.
PRACE Experiences of an e-Infrastructure Flagship Project
Tohoku University, Japan
Enabling machine learning in embedded systems
Appro Xtreme-X Supercomputers
Scientific Computing At Jefferson Lab
BlueGene/L Supercomputer
with Computational Scientists
The C&C Center Three Major Missions: In This Presentation:
K computer RIKEN Advanced Institute for Computational Science
K computer RIKEN Advanced Institute for Computational Science
Presentation transcript:

Updates of AICS and the next step for Post-Petascale Computing in Japan Mitsuhisa Sato University of Tsukuba Team leader of programming environment research team, Advanced Institute for Computational Science (AICS), RIKEN 0

1 RIKEN Advanced Institute for Computational Science (AICS) The institute have been established at the K computer in Kobe (started in October 2010) Missions To run the K computer efficiently for users of wide research areas Carry out the leading edge of computational science technologies and contribute for COE of computational science in Japan Propose the future directions of HPC in Japan and conduct it. Organization Operation division, to run and manage the K computer Research division Started with 5 computational science research teams and 3 computer science research teams. In 2012, expanded to 10 computational science research teams and 6 computer science research teams. Promoting strong collaborations between computational and computer scientists, working with core-organizations of each fields together.

Divisions AICS Policy Planning Division AICS Research Support Division Research Division Operations and Computer Technologies Division 2

Research Division (16teams + 3units) System Software Research Team Programming Environment Research Team Processor Research Team Large-scale Parallel Numerical Computing Technology Research Team HPC Usability Research Team Field Theory Research Team Discrete Event Simulation Research Team Computational Molecular Science Research Team Computational Materials Science Research Team Computational Biophysics Research Team Particle Simulator Research Team Computational Climate Science Research Team Complex Phenomena Unified Simulation Research Team HPC Programming Framework Research Team Advanced Visualization Research Team Data Assimilation Research Team Computational Chemistry Research Unit Computational Disaster Mitigation and Reduction Research Unit Computational Structural Biology Research Unit 3

The status of the K computer The first racks of the K computer were delivered to Kobe on Sept, Rack : 864 (+54), Compute nodes (CPUs): 82,944 (88,128), Number of cores: 663,552(705,024) It has already achieved its primary target “ over 10 Peta-flops” (10.51PF Linpack, 12.66MW), the last November (2011). Installation and adjustment of K was complete, and the public use was started at the end of the September (2012). Photo of First delivery, Sep 28, 2010

5 K computer: compute nodes and network Rack : 864 (+ 54 ) Compute nodes (CPUs): 82,944 (88128 ) Number of cores: 663,552 ( ) Peak performance: 10.6PFLOPS (11.28) 10.51PF Linpack (12.66MW), Nov 2011 Memory: 1.33 PB (16GB/node) SPARC64 TM VIIIfx Courtesy of FUJITSU Ltd. Compute node Logical 3-dimensional torus network for programming High-Performance/Low Power CPU with 8 cores : 58W High Throughput/Low Latency Torus Network (Tofu) Logical 3-dimensional torus network Peak bandwidth: 5GB/s x 2 for each direction of logical 3-dimensional torus network bi-section bandwidth: 30TB/s

6 京コンピュータ “The K computer"

7 Projects to organize users of the K computer SPIRE (Strategic Programs for Innovative Research) The committee in MEXT has identified five application areas that are expected to create breakthroughs using the K computer from national viewpoint. National-wide High Performance Computing Infrastructure Project (HPCI) To organize computing resources and users, including university supercomputers the K computing in national-wide

SPIRE (Strategic Programs for Innovative Research) Purpose To produce scientific results as soon as HPCI starts its operation To establish several core institutes for computational science Outline of this program Identify the five strategic research areas which will contribute to produce results to scientific and social Issues A nation wide research groups are formed by funding the core organization designated by MEXT. The groups are to promote R&D using K computer and to construct research structures for their own area 50% computing resources of the K computer will be dedicated to this program

Five strategic areas of SPIRE Life science/Drug manufacture Monodukuri (Manufacturing technology) Monodukuri (Manufacturing technology) New material/energy creation Global change prediction for disaster prevention/mitigation ゲノム 全身 タンパク質 細胞 多階層の生命現象 組織,臓器 Toshio YANAGIDA (RIKEN) Shinji TSUNEYUKI (University of Tokyo) Shiro IMAWAKI (JAMSTEC) Chisachi KATO (University of Tokyo) The origin of matter and the universe Shinya AOKI (University of Tsukuba)

National-wide High Performance Computing Infrastructure Project (HPCI) Background: After re-evaluation of the project at “government party change” in 2011, the NGS project was restarted as “Creation of the Innovative High Performance Computing Infra-structure (HPCI)”. Building HPCI: High-Performance Computing Infrastructure Provide Seamless access to K computer, supercomputers, and user's machines Set up a large-scale storage system for the K computer and other supercomputers Joint selection of proposals for K and other supercomputers Organizing HPCI Consortium Organize users and computer centers and provide proposals/suggestions to the government and related organizations Plan and operation of HPCI system Promotion of computational sciences Future supercomputing consortium super computer super computer super computer super computer HPCI K computer Institutional/University computer centers Computational Science communities Advanced Institute for Computational Science (AICS), RIKEN The consortium has been organized and started in June

The Conceptual View of HPCI Consortium users K computer Large Scale Storage Supercomputers in Universities HPCI is a comprehensive advanced computing infrastructure in which the supercomputers and large scale storages are connected together through the high speed network.

Computing resources in HPCI 12

Storage System in first phase for HPCI Hokkaido University Tohoku University University of Tokyo University of Tsukuba Tokyo Institute of Technology Nagoya University Kyushu University Osaka UniversityKyoto University AICS, RIKEN 12 PB storage (52 OSS) Cluster for data analysis (87 node) 10 PB storage (30 OSS) Cluster for data analysis (88 node) HPCI WEST HUB HPCI EAST HUB Gfarm2 is used as the global shared file system 13

HPCI offers to computational science users … Computing resource Computational science researchers can make use of proper amount of computing resources more effectively and efficiently K computer and university supercomputers Authentication Users can get access to these computers and storages with a single- sign-on account Storage Users can share a large amount of data in the storage Analyze or visualize results simulated by other researchers on different supercomputers 14

Users and jobs about 100 active users and 1000 jobs per day period I Sep.2012-Mar.2013 period II Apr.2013-Sep.2013 period III Oct.2013-Mar.2014

Job Property(1/2) Larger jobs (more than 5000 nodes (= 0.5PF)) consume about 40-50% of the resource. Used / Serviced ratio reaches 80%. period I Sep.2012-Mar.2013 period II Apr.2013-Sep.2013 period III Oct.2013-Mar.2014

Job Property(2/2) Average of the sustained performances and job scale are gradually increasing.

Activities for the future HPC development FY2011FY2012 FY2013 SDHPC WGs: Technical discussion by two groups FS “Feasibility study” for future HPC Discussion on HPC policy Decision of future HPC R&D Discussion on basic concept SDHPC White paper was published Review We are here now

The SDHPC white paper and “Feasibility Study" project WGs were organized for drafting the white paper for Strategic Direction/Development of HPC in JAPAN by young Japanese researchers with advisers (seniors) Contents Science roadmap until 2020 and List of application for 2020’s Four types of hardware architectures identified and performance projection in 2018 estimated from the present technology trend Necessity of further research and development to realize the science roadmap For “Feasibility Study" project, 4 research teams were accepted Application study team leaded by RIKEN AICS (Tomita) System study team leaded by U Tokyo (Ishikawa) Next-generation “General-Purpose” Supercomputer System study team leaded by U Tsukuba (Sato) Study on exascale heterogeneous systems with accelerators System study team leaded by Tohoku U (Kobayashi) Projects were started from July 2012 (1.5 year) … 19

System requirement analysis for Target sciences System performance FLOPS: 800 – 2500PFLOPS Memory capacity: 10TB – 500PB Memory bandwidth: – 1.0 B/F Example applications Small capacity requirement MD, Climate, Space physics, … Small BW requirement Quantum chemistry, … High capacity/BW requirement Incompressibility fluid dynamics, … Interconnection Network Not enough analysis has been carried out Some applications need >1us latency and large bisection BW Storage There is not so big demand 20 Low BW Middle capacity High BW small capacity High BW middle capacity High BW High capacity (From SDHPC white paper)

Alternatives of ExaScale Architecture Four types of architectures are identified for exascale: General Purpose (GP) Ordinary CPU-based MPPs e.g.) K-Computer, GPU, Blue Gene, x86-based PC-clusters Capacity-Bandwidth oriented (CB) With expensive memory-I/F rather than computing capability e.g.) Vector machines Reduced Memory (RM) With embedded (main) memory e.g.) SoC, MD-GRAPE4, Anton Compute Oriented (CO) Many processing units e.g.) ClearSpeed, GRAPE-DR, GPU? 21 (From SDHPC white paper)

22 Issues for exascale computing Two important aspects of post- petascale computing Power limitation < MW Strong-scaling < 10^6 nodes, for FT > 10TFlops/node accelerator, many-cores Solution: Accelerated Computing by GPGPU by Application-specific Accelerator by... future acceleration device... the K computer simple projection of #nodes and peak flops

Study on exascale heterogeneous systems with accelerators (U Tsukuba project) Two keys for exascale computing Power and strong-scaling We study “exascale” heterogeneous systems with accelerators of many- cores. We are interested in: Architecture of accelerators, core and memory architecture Special-purpose functions Direct connection between accelerators in a group Power estimation and evaluation Programming model and computational science applications Requirement for general-purpose system etc … 23

PACS-G: a straw man architecture SIMD architecture, for compute oriented apps (N-body, MD), and stencil apps cores (64x64), 4GFlops x 4096 = 16TFlops/chip 2D mesh (+ broardcast/reduction) on-chip network for stencil apps. We expect 14nm technology at the range of year , Chip dai size: 20mm x 20mm Mainly working on on-chip memory (size 512 MB/chip, 128KB/core), and, with module memory by 3D-stack/wide IO DRAM memory (via 2.5D TSV), bandwidth 1000GB/s, size 16-32GB/chip No external memory (DIM/DDR) 250 W/chips expected 64K chips for 1 EFLOPS (at peak) PACS-G チップ 3D stack or Wide IO DRAM

A group of 1024 ~ 2048 chips are connected via accelerator network (inter-chip network) 25 – 50Gpbs/link for inter-chip: If we extend 2- D mesh network to the (2D-mesh) external net work in a group, we need 200 ~ 400GB/s (= 32 ch. x 25 ~ 50Gbps x 2(bi-direction)) For 50Gpbs data transfer, we may need direct optical interconnect from chip. I/O Interface to Host: PCI Express Gen 4 x16 (not enough!!!) interconnect between chips (2D mesh) Programming model: XcalableMP + OpenACC Use OpenACC to specify offloaded fragme nt of code and data movement To align data and computation to core, we use the concept "template" of XcalableMP (virtual index space). We can generate code for each core. (And data parallel lang. like C*) An example of implementation (for 1U rack) PACS-G: a straw man architecture

Project organization Joint project with Titech (Makino), Aizu U (Nakazato), RIKEN (Taiji), U Tokyo, KEK, Hiroshima U, and Hitachi as a super computer company Target apps: QCD in particle physics, tree N-body, HMD in Astrophysics, MD in life sci., FDM of earthquake, FMO in chemistry, NICAM in climate sci. 26

Current status and schedule We are now working on performance estimation by co-design process 2012 (done): QCD, N-body, MD, HMD 2013: earth quake sim, NICAM (climate), FMO (chemistry) When all data fits on on-chip memory, ratio B/F is 4 B/F, total mem size 1TB/group When data fits into module memory, ratio B/F is 0.05B/F, total mem size 32TB/group Also, developing simulators (clock-level/instruction level) for more precious and quantitative performance evaluation Compiler development (XMP and OpenACC) (Re-)Design and investigation of network topology 2D mesh is sufficient? or, other alternatives? Code development for apps using Host and Acc, including I/O Precise and more detail estimation of power consumptions

AICS Development of International Partnership NCSA, US under MoU (1 st meeting in April, 2013) NCI (National Computational Infracture), Australia, under MoU JSC, Germany, under MoU SISSA (THE SCUOLA INTERNAZIONALE SUPERIORE DI STUDI AVANZATI), Italia, under agreement. Unversity of Maryland, US, with agreement for collaboration on modeling and data assimilation. "maison de la simulation" (INRIA/CNRS), France, under discussion ANL, US, under discussion Recently, Japan and US agreed the collaboration on system software for supercomputing at "U.S.-Japan Joint High-Level Committee Meeting on Science and Technology Cooperation" Workshop on international collaboration for exascale computing will be organized in ISC2013, next week. Towards to JLESC. (esp. for exascale software development) 28