Presentation is loading. Please wait.

Presentation is loading. Please wait.

Commodity Computing Clusters - next generation supercomputers? Paweł Pisarczyk, ATM S. A.

Similar presentations

Presentation on theme: "Commodity Computing Clusters - next generation supercomputers? Paweł Pisarczyk, ATM S. A."— Presentation transcript:

1 Commodity Computing Clusters - next generation supercomputers? Paweł Pisarczyk, ATM S. A.

2 Agenda Introduction Supercomputer classification Architecture and implementations Commodity clusters Processors Operating systems Summary

3 Supercomputer „A supercomputer is a device for turning compute- bound problems into I/O-bound problem” - Seymour Cray A supercomputer is a computer system that leads the world in terms of processing capacity, particularly speed of calculations, at the time of its introduction. source:

4 Supercomputer History (1) 1945-50 - Manchester Mark I 1950-55 - MIT Whirlwind 1955-60 - IBM 7090 - 210 KFLOPS 1960-65 - CDC 6600 -10.24 MFLOPS 1965-70 - CDC 7600 - 32.27 MFLOPS 1970-75 - CDC Cyber 76

5 Supercomputer History (2) 1975-80 - Cray-1 - 160 MFLOPS 1980-85 - Cray X-MP - 500 MFLOPS 1985-90 - Cray Y-MP - 1.3 GFLOPS 1990-95 - Fujitsu Numerical Wind Tunnel - 236 GFLOPS 1995-00 - Intel ASCI Red - 2.150 TFLOPS 2000-02 - IBM ASCI White, SP Power3 375 MHz - 7.226 TFLOPS 2002-03 - NEC Earth Simulator - 35 TFLOPS

6 Supercomputer Classes (1) General-purpose supercomputers: –vector processing machines - the same operation carried out on a large amount of data simultaneously –tightly connected cluster computers (NUMA) - communication oriented architectures engineered from ground up, based on high speed interconnects and large number of processors –commodity clusters - collection of large number of commodity PCs (COTS) interconnected by high- bandwidth low-latency network

7 Supercomputer Classes (2) Special-purpose supercomputers - high performance computing devices with a hardware architecture dedicated to solve a single problem (equipped with custom ASICS or FPGA chips) Examples –Deep Blue –GRAPE for astrophysics

8 Flynn taxonomy - 1972 (1) SISD - Single Instruction Single Data (DEC, Sun Microsystems, PC) SIMD - Single Instruction Multiple Data –computers with large number o processing units (i.e. ALUs) - CPP DAP Gamma II, Quadrics Apemille –vector processing machines - NEC SX6, IA32 MMX MISD - Multiple Instruction Single Data –theoretical model, no practical implementation

9 Flynn taxonomy - 1972 (2) MIMD - Multiple Instruction Multiple Data –SM-MIMD - Shared Memory MIMD global address space SMP systems and ccNUMA systems –DM-MIMD - Distributed Memory MIMD many nodes with local address spaces high-bandwidth, low-latency communication common NUMA architectures (Non Uniform Memory Access) operating system have to be communication oriented (Mach project)

10 SM-MIMD implementations S-COMA - Simple Cache-Only Memory Architecture –common SMP systems ccNUMA - Cache Coherent NUMA –SGI Origin 3000 –SGI Altix 3000 –HP SuperDome

11 S-COMA (SMP) CPU 0 RAM L2 cache CPU 1CPU N

12 ccNUMA CPU 0 RAM 0 L2 cache CPU 1 L3 cache L2 cache CPU N-1 L2 cache CPU N L3 cache RAM K

13 ccNUMA implementation SGI Altix 3000 (ccNUMA) 64 Itanium 2 (IA64) processors C-brick modules with 2 CPUs and ASIC SHUB NUMAflex, NUMAlink interconnects (6.4 GB/s, 2.4 GB/s) Modified Linux kernel (2.6 NUMA support)

14 DM-MIMD implementations Massively parallel systems (NUMA) –communication oriented architecture –low-latency, high-bandwidth interconnects –topologies: hypercube, torus, tree –Butterfly networks, Omega networks, engineered from ground up communication

15 DM-MIMD implementations Commodity clusters –a cluster is a collection of connected, independent computers working in unison to solve a problem –COTS technology –nodes are interconnected by Ethernet LAN, Myrinet, QsNet ELAN etc. –computation can be performed by using popular programming toolkits and frameworks: OpenMP, MPI –clusters require dedicated management software

16 NUMA implementations Cray T3E-1350 Processor: Alpha 21164 675 MHz Number of CPUs: 40 - 2176 3-D Torus topology Operating system: UNICOS/mk - microkernel based Peak performance: 3 TFLOPS

17 Commodity cluster implementation (1) Linux Networx/Quadrics Processor: Intel Xeon 2.4 GHz CPUs: 2304 Interconnections: QsNet ELAN3 Operating system: Linux + management tools + Lustre Cluster File System Peak performance: 7.6 TFLOPS 3 rd computer on TOP500 list Developed for Lawrence Livermore National Laboratory in 2002

18 Commodity cluster implementation (2) HP XC6000 Cluster (XC3000 Cluster) Processor: Intel Itanium 2 6M 1.5 GHz (Intel Xeon 3 GHz) Node: HP Integrity rx2600 (HP ProLiant DL380) Number of processors: 34-512 Interconnections: QsNet ELAN3 (Myricom Myrinet XP) Operating system: Linux + SSI Middleware + management tools + Lustre Cluster File System Peak performance: 34 CPUs - 204 GFLOPS, 512 CPUs - 3 TFLOPS

19 Commodity Clusters - software Operating system - Linux or SSI Linux (Single System Image) Platform for specialized applications for science, engineering and business (simulation, modeling, data mining) Distributed computation environments are used for software development (OpenMP, MPI) Common supercomputer applications require porting to clusters

20 Performance Scaling Scale-Out (Cluster) Scale-Up (SMP, ccNUMA) Scale Right

21 Processors (1) Many types of existing processors are used in supercomputers Microprocessor development directions: –Increasing of clock frequency and speed instruction stream processing –Processing of large collection of data in single processor instruction - SIMD –Control path multiplication – multithreading

22 Processors (2) Vector processors –NEC SX-6 –Cray (Cray X1) RISC processors –MIPS –IBM Power4 –Alpha CISC processors –IA32 –AMD x86-64 VLIW processors –IA64

23 Intel Itanium 2 features State-of-the-art unconventional 64-bit architecture New programming model implementing VLIW paradigm EPIC technology – Explicitly Parallel Instruction Computing – compiler determines instruction dependency informing processor how to process an instruction stream parallel Many registers (128 64-bit), register stack management 6 GFLOPS peak performance Full advantages of the processor can be used by dedicated compiler

24 Operating systems Monolithic kernel based OSs - UNIX (modification of existing solutions) –BSD –Solaris –Irix –Linux Microkernel based OSs –Mach

25 Microkernel architecture Task ATask B Kernel Task C Kernel Hardware

26 Summary Today’s there is a lot of supercomputer architectures Both vector processors and common RISC, CISC, VLIW chips are used for supercomputers Commodity clusters under control of Linux OS are an attractive method for supercomputer implementation

27 TOP 500 list (1) 1. Earth Simulator, NEC - 35.86 TFLOPS 2. HP Alphaserver SC, HP - 13.88 TFLOPS 3. Linux Networx / Quadrics IA32 - 7.634 TFLOPS

28 Top 500 list (2) Source:

Download ppt "Commodity Computing Clusters - next generation supercomputers? Paweł Pisarczyk, ATM S. A."

Similar presentations

Ads by Google