Download presentation
Presentation is loading. Please wait.
Published byMorgan Fleury Modified over 10 years ago
1
KAIS T Brief presentation of Earth Simulation Center Jang, Jae-Wan
2
CS610 Hardware configuration Highly parallel vector supercomputer of the distributed- memory type 640 Processor nodes (PNs) PN 8 vector-type arithmetic processors (APs) 16 GB main momory Remote control and I/O parts
3
CS610 Arithmetic processor
4
CS610 Processor node
5
CS610 Processor node
6
CS610 Interconnection network
7
CS610 Interconnection Network
8
CS610 65m 50m Earth Simulator Research and Development Center
9
CS610 Software OS NEC’s UNIX-based OS : SUPER-UX Programming model Supported language Fortran90, C, C++ (modified for ES) hybridflat Inter-PNHPF/MPI Intra-PNMicrotasking/OpenMP APAutomatic vectoriztion
10
KAIS T Earth Simulator Center First results from the Earth Simulator Resolution 300km
11
KAIS T Earth Simulator Center First results from the Earth Simulator Resolution 120km
12
KAIS T Earth Simulator Center First results from the Earth Simulator Resolution 20km
13
KAIS T Earth Simulator Center First results from the Earth Simulator Resolution 10km
14
CS610 Ocean Circulation Model ( MOM3 developed by GFDL ) resolution : 0.1º× 0.1º ( 10km) initial condition : Levitus data (1982) computer resources : number of nodes = 175, elapsed time 8,100 hours First results from the Earth Simulator
15
CS610 First results from the Earth Simulator resolution : 0.1º× 0.1º ( 10km)resolution : 1º× 1º ( 100km) Ocean Circulation Model ( MOM3 developed by GFDL )
16
KAIS T Terascale Cluster: System X Virginia Tech, Apple, Mellanox, Cisco, and Liebert 2003. 3. 16 Daewoo Lee
17
CS610 Terascale Cluster: System X A Groundbreaking Supercomputer Cluster with Industrial Assistance Apple, Mellanox, Cisco, and Liebert $5.2 million for hardware 10280/17600 GFlops of Performance with 1100 Nodes (3 rd Ranked in TOP500 Supercomputer Site)
18
CS610 Goals Computational Science and Engineering Research Nanoscale Electronics Quantum Chemistry Molecular Statistics Fluid Dynamics Large-Scale Network Emulation Optimal Design … Computational Science and Engineering Research Nanoscale Electronics Quantum Chemistry Molecular Statistics Fluid Dynamics Large-Scale Network Emulation Optimal Design … Experimental System Fault Tolerance and Migration Queuing System and Scheduler Distributed Operating System Parallel Filesystem Middleware for Grids Authentication/Security System … Experimental System Fault Tolerance and Migration Queuing System and Scheduler Distributed Operating System Parallel Filesystem Middleware for Grids Authentication/Security System … Dual Usage Mode (90% of computational cycles devoted to production use)
19
CS610 Hardware Architecture Node Apple G5 Platform Dual IBM PowerPC 970 (64-bit CPU) Primary Communication InfiniBand by Mellanox (20Gbps full duplex, fat-tree topology) Secondary Communication Gigabit Ethernet by Cisco Cooling Systemby Liebert
20
CS610 Software Mac OS X (FreeBSD based) MPI-2 (MPICH-2) Support C/C++/Fortran compilation Déjà vu: transparent fault-tolerance system Maintain computer stability by transferring a failed application to another location without alerting the computer, thus keeping the application intact.
21
CS610 Reference Terascale Cluster Web Site http://computing.vt.edu/research_computing/terascale
22
KAIS T 4th fastest supercomputer Tungsten PAK, EUNJI
23
CS610 4 th : NCSA Tungsten Top500.org National Center for Supercomputing Applications (NCSA) University of Illinois at Urbana-Champaign
24
CS610 Tungsten Architecture [1/3] Tungsten Xeon 3.0 GHz Dell cluster 2,560 processors 3 GB memory/node Peak performance: 15.36 TF Top 500 list debut: #4 (9.819 TF, November 2003) Currently 4th fastest supercomputer in the world
25
CS610 Tungsten Architecture [2/3] Components Myrinet I/O Node 104 nodes I/O Node Shared:122TB Compute Node PP PP PP 1280 nodes (2560 Processors) Dell PowerEdge 1750 with 3GB DDR SDRAM Intel Xeon 3.06 GHz (dual) Linux 2.4.20 (Red Hat 9.0) Cluster File System Compilers Intel Fortran 77/90/95 C C++ GNU Fortran 77 C C++ LSF + Maui Scheduler User Applications
26
CS610 Tungsten Architecture [3/3] 1450 nodes Dell PowerEdge 1750 Server Intel Xeon 3.06GHZ : Peak performance 6.12GFLOPS 1280 compute nodes, 104 I/O nodes Parallel I/O 11.1 Gigabytes per second (GB/s) of I/O throughput Complements the cluster’s 9.8TFLOPS of computational capability 104 node I/O sub-cluster with more than 120TB Node local : 73GB, Shared : 122TB
27
CS610 Applications on Tungsten [1/3] PAPI and PerfSuite PAPI : Portable interface to hardware performance counters PerfSuite : Set of tools for performance analysis on Linux platforms
28
CS610 Applications on Tungsten [2/3] PAPI and PerfSuite
29
CS610 Applications on Tungsten [3/3] CHARMM (Harvard Version) Chemistry at Harvard Macromolecular Mechanics General purpose molecular mechanics, molecular dynamics and vibrational analysis packages Amber 7.0 A set of molecular mechanical force fields for the simulation of bimolecular Package of molecular simulation programs
30
KAIS T MPP2 Supercomputer The world ’ s largest Itanium2 cluster. Molecular Science Computing Facility Pacific Northwest National Laboratory 2004. 3. 16 Presentation : Kim SangWon
31
CS610 Contents MPP2 Supercomputer Overview Configuration HP rx2600(Longs Peak) Node QsNet ELAN Interconnect Network System/Application Software File System Future Plan
32
CS610 MPP2 Overview MPP2 The High Performance Computing System-2 At the Molecular Science Computing Facility in the William R. Wiley Environmental Molecular Sciences Laboratory at Pacific Northwest National Laboratory the fifth-fastest supercomputer in the world in the November 2003
33
CS610 MPP2 Overview System Name : Mpp2 Linux Supercomputer cluster 11.8(8.633) Teraflops 6.8 Terabytes of memory Purpose : Production Platform : HP Integrity rx2600 bi-Itanium2 1,5 Ghz Nodes : 980 (Processors : 1960) ¾ Megawatt of power 220 Tons of Air Conditioning 4,000 Sq. Ft. Cost: $24.5 million (estimated) Generator UPS
34
CS610 Configuration(Phase2b) 1,856 Madison Batch CPUs Elan4 4 Login nodes with 4Gb-Enet 2 System Mgt nodes 1,900 next generation Itanium ® processors 11.4TF 6.8TB Memory …... 928 compute nodes SAN / 53TB … Lustre Elan3 Elan4 Not Operational Operational: September 2003
35
CS610 Each node has: 2 Intel Itanium 2 Processors(1.5Ghz) 6.4GB/s System bus 8.5GB/s Memory bus 12GB of RAM 1 1000T Connection 1 100T Connection 1 Serial Connection 2 Elan3 Connections HP rx2600 Longs Peak Node Architecture PCI-X2 (1GB/s) Elan3 2SCSI160
36
CS610 QsNet ELAN Interconnect Network High bandwidth, Ultra low latency and scalability 900Mbytes/s user space to user space bandwidth. 1024 nodes for standard QsNet conf., rising to 4096 in QsNetII systems. Optimized libraries for common distributed memory programming models exploit the full capabilities of the base hardware.
37
CS610 Software on MPP2 (1/2) System Software Operating System - Red Hat Linux 7.2 Advanced Server NWLinux : tailored to IA64 clusters (2.4.18 kernel with various patches) Cluster Management : Resource Management System(RMS) by Quadrix A single point interface to the system for resource management Monitoring, Fault diagnosis, Data collection, Allocating CPUs, Parallel jobs execution… Job Management Software LSF(Load Sharing Facility) Batch Scheduler QBank : Control and Manage CPU resources allocated to projects or users. Compiler Software C (ecc), F77/F90/F95 (efc), G++ Code Development Etnus TotalView A parallel and multithreaded application debugger Vampir the GUI driven frontend used to visualize the profile data of running a program gdb
38
CS610 Software on MPP2 (2/2) Application Software Quantum Chemistry Codes GAMESS(The General Atomic and Molecular Electronic Structure System) performing a variety of ab initio molecular orbital (MO) calculations MOLPRO an advanced ab initio quantum chemistry software package NWChem computational chemistry software developed by EMSL ADF (Amsterdam Density Functional) 2000 software for first-principle electronic structure calculations via Density-Functional Theory (DFT) General Molecular Modeling Software : Amber Unstructured Mesh Modeling Codes NWGrid (Grid Generator) hybrid mesh generation, mesh optimization, and dynamic mesh maintenance NWPhys (Unstructured Mesh Solvers) a 3D, full-physics, first principles, time-domain, free-Lagrange code for parallel processing using hybrid grids.
39
CS610 File System on MPP2 Four file systems available on the cluster: Local filesystem(/scratch) On each of the compute nodes Non-persistent storage area provided to a parallel job running on that node. NFS filesystem(/home) User home directory and files are located. Uses RAID-5 for reliability Lustre Global filesystem(/dtemp) Designed for the world's largest high-performance compute clusters. Aggregate write rate of 3.2 Gbyte/s. Restart files and files needed for post analysis. Long term global scratch space AFS filesystem(/msrc) On the front-end (non-compute) nodes
40
CS610 Future Plan… MPP2 will be upgraded with the faster Quadrics QsNetII interconnect in early 2004 1,856 Madison Batch CPUs Elan4 4 Login nodes with 4Gb-Enet 2 System Mgt nodes …... 928 compute nodes SAN / 53TB … Lustre
41
KAIS T Bluesky Supercomputer Top 500 Supercomputers CS610 Parallel Processing Donghyouk Lim (Dept of Computer Science, KAIST)
42
CS610 Contents Introduction National Center for Atmosphere Research Scientific Computing Division Hardware Software Recommendations for usage Related Link
43
CS610 Introduction Bluesky 13th Supercomputer in the world Clustered Symmetric Multi-Processing(SMP) System 1600 IBM Power 4 processor Peak of 8.7 TFLOP
44
CS610 National Center for Atmosphere Research Established in 1960 Located in Boulder, Colorado Research area Earth system Climate change Changes in atmospheric composition
45
CS610 Scientific Computing Division Research on high- performance supercomputing Computing resources Bluesky (IBM Cluster 1600 running AIX) : 13th place blackforest (IBM SP RS/6000 running AIX) : 80th place Chinook complex: Chinook (SGI Origin3800 running IRIX) and Chinook (SGI Origin2100 running IRIX)
46
CS610 Hardware Processor 1600 Power 4 Processors 1.3 GHz each can perform up to 4 fp operations per cycle Peak of 8.7 TFLOPS Memory 2 GB memory per processor memory on a node is shared between processors on that node Memory Caches L1 cache : 64KB I-cache, 32KB d-cache, direct mapped L2 cache : For pair of processors, 1.44MB, 8-way set associative L3 cache : 32MB, 512byte cache line, 8-way set associative
47
CS610 Hardware Computing Nodes 8-way processor nodes: 76 32-way processor nodes: 25 32-processor nodes for running interactive jobs: 4 Separate nodes for user logins System support nodes 12 nodes dedicated to the General Parallel File System (GPFS) Four nodes dedicated to HiPPI communications to the Mass Storage System Two master nodes dedicated to controlling LoadLeveler operations One dedicated system monitoring node One dedicated test node for system administration, upgrades, testing
48
CS610 Hardware Storage RAID disk storage capacity: 31.0 TB total Each user application can access 120 GB of temporary space Interconnect fabric SP switch2 (“Colony” switch) Two full duplex network path to increase throughput Bandwidth : 1.0GB per second bidirectional Worst case latency : 2.5 microsecond HiPPI(High-Performance Parallel Interface) to the Mass Storage System Gigabit Ethernet network
49
CS610 Software Operating System: AIX (IBM-proprietary UNIX) Compilers: Fortran (95/90/77), C, C++ Batch subsystem: LoadLeveler Managing serial and parallel jobs over a cluster of servers File System: General Parallel File System (GPFS) System information commands: spinfo for general information, lslpp for information about libraries
50
CS610 Related Links NCAR : http://www.ncar.ucar.edu/ncar/http://www.ncar.ucar.edu/ncar/ SCD : http://www.scd.ucar.edu/http://www.scd.ucar.edu/ Bluesky : http://www.scd.ucar.edu/computers/bluesky/http://www.scd.ucar.edu/computers/bluesky/ IBM p690 : http://www- 903.ibm.com/kr/eserver/pseries/highend/p690.htmlhttp://www- 903.ibm.com/kr/eserver/pseries/highend/p690.html
51
KAIS T About Cray X1 Kim, SooYoung (sykim@camars.kaist.ac.kr) (Dept of Computer Science, KAIST)
52
CS610 Features (1/2) Contributing areas weather and climate prediction, aerospace engineering, automotive design, and a wide variety of other applications important in government and academic research Army High Performance Computing Research Center (AHPCRC), Boeing, Ford, Warsaw Univ., U.S. Government, Department of Energy's Oak Ridge National Laboratory (ORNL) Operating System: UNICOS/mp tm from UNICOS, UNICOS/mk tm True single system image (SSI) Scheduling algorithms for parallel applications Accelerated application mode and migration Variable processor utilization: Each CPU has four internal processors Together as a closely coupled, multistreaming processor (MSP) Individually as four single-streaming processors (SSPs) Flexible system partitioning
53
CS610 Features (2/2) Scalable system architecture Distributed shared memory (DSM) Scalable cache coherence protocol Scalable address translation Parallel programming models Shared-memory parallel models Traditional distributed-memory parallel models: MPI and SHMEM Up-and-coming global distributed-memory parallel models: Unified Parallel C(UPC) Programming environments Fortran compiler, C and C++ compiler High-performance scientific library (LibSci), language support libraries, system libraries Etnus TotalView debugger, CrayPat (Cray Performance Analysis Tool)
54
CS610 Node Architecture Figure 1. Node, Containing Four MSPs
55
CS610 System Conf. Examples CabinetsCPUsMemoryPeak Performance 1 (AC)1664 – 256 GB204.8 Gflops 164256 – 1024 GB819.0 Gflops 42561024 – 4096 GB3.3 Tflops 85122048 – 8192 GB6.6 Tflops 1610244096 – 16384 GB13.1 Tflops 3220488192 – 32768 GB26.2 Tflops 64409616384 – 65536 GB52.4 Tflops
56
CS610 Technical Data (1/2) Technical specifications Peak performance52.4 Tflops in a 64 cabinet configuration ArchitectureScalable vector MPP with SMP nodes Processing element ProcessorCray custom design vector CPU 16 vector floating-point operations/clock cycle 32- and 64-bit IEEE arithmetic Memory size16 to 64GB per node Data error protectionSECDED Vector clock speed800MHz Peak performance12.8 Gflops per CPU Peak memory bandwidth34.1 GB/sec per CPU Peak cache bandwidth76.8 GB/sec per CPU Packaging4 CPUs per node Up to 4 nodes per AC cabinet, up to 4 interconnected cabinets Up to 16 nodes per LC cabinet, up to 64 interconnected cabinets
57
CS610 Technical Data (2/2) Memory TechnologyRDRAM with 204 GB/sec peak bandwidth per node ArchitectureCache coherent, physically distributed, globally addressable Total system memory size32 GB to 64 TB Interconnect network TopologyModified 2D torus Peak global bandwidth400 GB/sec for a 64-CPU Liquid Cooled (LC) system I/O I/O system port channels4 per node Peak I/O bandwidth1.2 GB/sec per channel
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.