Presentation is loading. Please wait.

Presentation is loading. Please wait.

KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan.

Similar presentations


Presentation on theme: "KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan."— Presentation transcript:

1 KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan

2 CS610 Hardware configuration Highly parallel vector supercomputer of the distributed- memory type 640 Processor nodes (PNs) PN 8 vector-type arithmetic processors (APs) 16 GB main momory Remote control and I/O parts

3 CS610 Arithmetic processor

4 CS610 Processor node

5 CS610 Processor node

6 CS610 Interconnection network

7 CS610 Interconnection Network

8 CS610 65m 50m Earth Simulator Research and Development Center

9 CS610 Software OS NEC’s UNIX-based OS : SUPER-UX Programming model Supported language Fortran90, C, C++ (modified for ES) hybridflat Inter-PNHPF/MPI Intra-PNMicrotasking/OpenMP APAutomatic vectoriztion

10 KAIS T Earth Simulator Center First results from the Earth Simulator Resolution  300km

11 KAIS T Earth Simulator Center First results from the Earth Simulator Resolution  120km

12 KAIS T Earth Simulator Center First results from the Earth Simulator Resolution  20km

13 KAIS T Earth Simulator Center First results from the Earth Simulator Resolution  10km

14 CS610 Ocean Circulation Model ( MOM3 developed by GFDL )  resolution : 0.1º× 0.1º (  10km)  initial condition : Levitus data (1982)  computer resources : number of nodes = 175, elapsed time  8,100 hours First results from the Earth Simulator

15 CS610 First results from the Earth Simulator resolution : 0.1º× 0.1º (  10km)resolution : 1º× 1º (  100km) Ocean Circulation Model ( MOM3 developed by GFDL )

16 KAIS T Terascale Cluster: System X Virginia Tech, Apple, Mellanox, Cisco, and Liebert Daewoo Lee

17 CS610 Terascale Cluster: System X A Groundbreaking Supercomputer Cluster with Industrial Assistance Apple, Mellanox, Cisco, and Liebert $5.2 million for hardware 10280/17600 GFlops of Performance with 1100 Nodes (3 rd Ranked in TOP500 Supercomputer Site)

18 CS610 Goals Computational Science and Engineering Research Nanoscale Electronics Quantum Chemistry Molecular Statistics Fluid Dynamics Large-Scale Network Emulation Optimal Design … Computational Science and Engineering Research Nanoscale Electronics Quantum Chemistry Molecular Statistics Fluid Dynamics Large-Scale Network Emulation Optimal Design … Experimental System Fault Tolerance and Migration Queuing System and Scheduler Distributed Operating System Parallel Filesystem Middleware for Grids Authentication/Security System … Experimental System Fault Tolerance and Migration Queuing System and Scheduler Distributed Operating System Parallel Filesystem Middleware for Grids Authentication/Security System … Dual Usage Mode (90% of computational cycles devoted to production use)

19 CS610 Hardware Architecture Node Apple G5 Platform Dual IBM PowerPC 970 (64-bit CPU) Primary Communication InfiniBand by Mellanox (20Gbps full duplex, fat-tree topology) Secondary Communication Gigabit Ethernet by Cisco Cooling Systemby Liebert

20 CS610 Software Mac OS X (FreeBSD based) MPI-2 (MPICH-2) Support C/C++/Fortran compilation Déjà vu: transparent fault-tolerance system Maintain computer stability by transferring a failed application to another location without alerting the computer, thus keeping the application intact.

21 CS610 Reference Terascale Cluster Web Site

22 KAIS T 4th fastest supercomputer Tungsten PAK, EUNJI

23 CS610 4 th : NCSA Tungsten Top500.org National Center for Supercomputing Applications (NCSA) University of Illinois at Urbana-Champaign

24 CS610 Tungsten Architecture [1/3] Tungsten Xeon 3.0 GHz Dell cluster 2,560 processors 3 GB memory/node Peak performance: TF Top 500 list debut: #4 (9.819 TF, November 2003) Currently 4th fastest supercomputer in the world

25 CS610 Tungsten Architecture [2/3] Components Myrinet I/O Node 104 nodes I/O Node Shared:122TB Compute Node PP PP PP 1280 nodes (2560 Processors) Dell PowerEdge 1750 with 3GB DDR SDRAM Intel Xeon 3.06 GHz (dual) Linux (Red Hat 9.0) Cluster File System Compilers Intel Fortran 77/90/95 C C++ GNU Fortran 77 C C++ LSF + Maui Scheduler User Applications

26 CS610 Tungsten Architecture [3/3] 1450 nodes Dell PowerEdge 1750 Server Intel Xeon 3.06GHZ : Peak performance 6.12GFLOPS 1280 compute nodes, 104 I/O nodes Parallel I/O 11.1 Gigabytes per second (GB/s) of I/O throughput Complements the cluster’s 9.8TFLOPS of computational capability 104 node I/O sub-cluster with more than 120TB Node local : 73GB, Shared : 122TB

27 CS610 Applications on Tungsten [1/3] PAPI and PerfSuite PAPI : Portable interface to hardware performance counters PerfSuite : Set of tools for performance analysis on Linux platforms

28 CS610 Applications on Tungsten [2/3] PAPI and PerfSuite

29 CS610 Applications on Tungsten [3/3] CHARMM (Harvard Version) Chemistry at Harvard Macromolecular Mechanics General purpose molecular mechanics, molecular dynamics and vibrational analysis packages Amber 7.0 A set of molecular mechanical force fields for the simulation of bimolecular Package of molecular simulation programs

30 KAIS T MPP2 Supercomputer The world ’ s largest Itanium2 cluster. Molecular Science Computing Facility Pacific Northwest National Laboratory Presentation : Kim SangWon

31 CS610 Contents MPP2 Supercomputer Overview Configuration HP rx2600(Longs Peak) Node QsNet ELAN Interconnect Network System/Application Software File System Future Plan

32 CS610 MPP2 Overview MPP2 The High Performance Computing System-2 At the Molecular Science Computing Facility in the William R. Wiley Environmental Molecular Sciences Laboratory at Pacific Northwest National Laboratory the fifth-fastest supercomputer in the world in the November 2003

33 CS610 MPP2 Overview System Name : Mpp2 Linux Supercomputer cluster 11.8(8.633) Teraflops 6.8 Terabytes of memory Purpose : Production Platform : HP Integrity rx2600 bi-Itanium2 1,5 Ghz Nodes : 980 (Processors : 1960) ¾ Megawatt of power 220 Tons of Air Conditioning 4,000 Sq. Ft. Cost: $24.5 million (estimated) Generator UPS

34 CS610 Configuration(Phase2b) 1,856 Madison Batch CPUs Elan4 4 Login nodes with 4Gb-Enet 2 System Mgt nodes 1,900 next generation Itanium ® processors 11.4TF 6.8TB Memory … compute nodes SAN / 53TB … Lustre Elan3 Elan4 Not Operational Operational: September 2003

35 CS610 Each node has: 2 Intel Itanium 2 Processors(1.5Ghz) 6.4GB/s System bus 8.5GB/s Memory bus 12GB of RAM T Connection 1 100T Connection 1 Serial Connection 2 Elan3 Connections HP rx2600 Longs Peak Node Architecture PCI-X2 (1GB/s) Elan3 2SCSI160

36 CS610 QsNet ELAN Interconnect Network High bandwidth, Ultra low latency and scalability 900Mbytes/s user space to user space bandwidth nodes for standard QsNet conf., rising to 4096 in QsNetII systems. Optimized libraries for common distributed memory programming models exploit the full capabilities of the base hardware.

37 CS610 Software on MPP2 (1/2) System Software Operating System - Red Hat Linux 7.2 Advanced Server NWLinux : tailored to IA64 clusters ( kernel with various patches) Cluster Management : Resource Management System(RMS) by Quadrix A single point interface to the system for resource management Monitoring, Fault diagnosis, Data collection, Allocating CPUs, Parallel jobs execution… Job Management Software LSF(Load Sharing Facility) Batch Scheduler QBank : Control and Manage CPU resources allocated to projects or users. Compiler Software C (ecc), F77/F90/F95 (efc), G++ Code Development Etnus TotalView A parallel and multithreaded application debugger Vampir the GUI driven frontend used to visualize the profile data of running a program gdb

38 CS610 Software on MPP2 (2/2) Application Software Quantum Chemistry Codes GAMESS(The General Atomic and Molecular Electronic Structure System) performing a variety of ab initio molecular orbital (MO) calculations MOLPRO an advanced ab initio quantum chemistry software package NWChem computational chemistry software developed by EMSL ADF (Amsterdam Density Functional) 2000 software for first-principle electronic structure calculations via Density-Functional Theory (DFT) General Molecular Modeling Software : Amber Unstructured Mesh Modeling Codes NWGrid (Grid Generator) hybrid mesh generation, mesh optimization, and dynamic mesh maintenance NWPhys (Unstructured Mesh Solvers) a 3D, full-physics, first principles, time-domain, free-Lagrange code for parallel processing using hybrid grids.

39 CS610 File System on MPP2 Four file systems available on the cluster: Local filesystem(/scratch) On each of the compute nodes Non-persistent storage area provided to a parallel job running on that node. NFS filesystem(/home) User home directory and files are located. Uses RAID-5 for reliability Lustre Global filesystem(/dtemp) Designed for the world's largest high-performance compute clusters. Aggregate write rate of 3.2 Gbyte/s. Restart files and files needed for post analysis. Long term global scratch space AFS filesystem(/msrc) On the front-end (non-compute) nodes

40 CS610 Future Plan… MPP2 will be upgraded with the faster Quadrics QsNetII interconnect in early ,856 Madison Batch CPUs Elan4 4 Login nodes with 4Gb-Enet 2 System Mgt nodes … compute nodes SAN / 53TB … Lustre

41 KAIS T Bluesky Supercomputer Top 500 Supercomputers CS610 Parallel Processing Donghyouk Lim (Dept of Computer Science, KAIST)

42 CS610 Contents Introduction National Center for Atmosphere Research Scientific Computing Division Hardware Software Recommendations for usage Related Link

43 CS610 Introduction Bluesky 13th Supercomputer in the world Clustered Symmetric Multi-Processing(SMP) System 1600 IBM Power 4 processor Peak of 8.7 TFLOP

44 CS610 National Center for Atmosphere Research Established in 1960 Located in Boulder, Colorado Research area Earth system Climate change Changes in atmospheric composition

45 CS610 Scientific Computing Division Research on high- performance supercomputing Computing resources Bluesky (IBM Cluster 1600 running AIX) : 13th place blackforest (IBM SP RS/6000 running AIX) : 80th place Chinook complex: Chinook (SGI Origin3800 running IRIX) and Chinook (SGI Origin2100 running IRIX)

46 CS610 Hardware Processor 1600 Power 4 Processors 1.3 GHz each can perform up to 4 fp operations per cycle Peak of 8.7 TFLOPS Memory 2 GB memory per processor memory on a node is shared between processors on that node Memory Caches L1 cache : 64KB I-cache, 32KB d-cache, direct mapped L2 cache : For pair of processors, 1.44MB, 8-way set associative L3 cache : 32MB, 512byte cache line, 8-way set associative

47 CS610 Hardware Computing Nodes 8-way processor nodes: way processor nodes: processor nodes for running interactive jobs: 4 Separate nodes for user logins System support nodes 12 nodes dedicated to the General Parallel File System (GPFS) Four nodes dedicated to HiPPI communications to the Mass Storage System Two master nodes dedicated to controlling LoadLeveler operations One dedicated system monitoring node One dedicated test node for system administration, upgrades, testing

48 CS610 Hardware Storage RAID disk storage capacity: 31.0 TB total Each user application can access 120 GB of temporary space Interconnect fabric SP switch2 (“Colony” switch) Two full duplex network path to increase throughput Bandwidth : 1.0GB per second bidirectional Worst case latency : 2.5 microsecond HiPPI(High-Performance Parallel Interface) to the Mass Storage System Gigabit Ethernet network

49 CS610 Software Operating System: AIX (IBM-proprietary UNIX) Compilers: Fortran (95/90/77), C, C++ Batch subsystem: LoadLeveler Managing serial and parallel jobs over a cluster of servers File System: General Parallel File System (GPFS) System information commands: spinfo for general information, lslpp for information about libraries

50 CS610 Related Links NCAR : SCD : Bluesky : IBM p690 : 903.ibm.com/kr/eserver/pseries/highend/p690.htmlhttp://www- 903.ibm.com/kr/eserver/pseries/highend/p690.html

51 KAIS T About Cray X1 Kim, SooYoung (Dept of Computer Science, KAIST)

52 CS610 Features (1/2) Contributing areas weather and climate prediction, aerospace engineering, automotive design, and a wide variety of other applications important in government and academic research Army High Performance Computing Research Center (AHPCRC), Boeing, Ford, Warsaw Univ., U.S. Government, Department of Energy's Oak Ridge National Laboratory (ORNL) Operating System: UNICOS/mp tm from UNICOS, UNICOS/mk tm True single system image (SSI) Scheduling algorithms for parallel applications Accelerated application mode and migration Variable processor utilization: Each CPU has four internal processors Together as a closely coupled, multistreaming processor (MSP) Individually as four single-streaming processors (SSPs) Flexible system partitioning

53 CS610 Features (2/2) Scalable system architecture Distributed shared memory (DSM) Scalable cache coherence protocol Scalable address translation Parallel programming models Shared-memory parallel models Traditional distributed-memory parallel models: MPI and SHMEM Up-and-coming global distributed-memory parallel models: Unified Parallel C(UPC) Programming environments Fortran compiler, C and C++ compiler High-performance scientific library (LibSci), language support libraries, system libraries Etnus TotalView debugger, CrayPat (Cray Performance Analysis Tool)

54 CS610 Node Architecture Figure 1. Node, Containing Four MSPs

55 CS610 System Conf. Examples CabinetsCPUsMemoryPeak Performance 1 (AC)1664 – 256 GB204.8 Gflops – 1024 GB819.0 Gflops – 4096 GB3.3 Tflops – 8192 GB6.6 Tflops – GB13.1 Tflops – GB26.2 Tflops – GB52.4 Tflops

56 CS610 Technical Data (1/2) Technical specifications Peak performance52.4 Tflops in a 64 cabinet configuration ArchitectureScalable vector MPP with SMP nodes Processing element ProcessorCray custom design vector CPU 16 vector floating-point operations/clock cycle 32- and 64-bit IEEE arithmetic Memory size16 to 64GB per node Data error protectionSECDED Vector clock speed800MHz Peak performance12.8 Gflops per CPU Peak memory bandwidth34.1 GB/sec per CPU Peak cache bandwidth76.8 GB/sec per CPU Packaging4 CPUs per node Up to 4 nodes per AC cabinet, up to 4 interconnected cabinets Up to 16 nodes per LC cabinet, up to 64 interconnected cabinets

57 CS610 Technical Data (2/2) Memory TechnologyRDRAM with 204 GB/sec peak bandwidth per node ArchitectureCache coherent, physically distributed, globally addressable Total system memory size32 GB to 64 TB Interconnect network TopologyModified 2D torus Peak global bandwidth400 GB/sec for a 64-CPU Liquid Cooled (LC) system I/O I/O system port channels4 per node Peak I/O bandwidth1.2 GB/sec per channel


Download ppt "KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan."

Similar presentations


Ads by Google