Commodity Computing Clusters - next generation supercomputers? Paweł Pisarczyk, ATM S. A.

Slides:



Advertisements
Similar presentations
© 2009 Fakultas Teknologi Informasi Universitas Budi Luhur Jl. Ciledug Raya Petukangan Utara Jakarta Selatan Website:
Advertisements

Introductions to Parallel Programming Using OpenMP
Parallel computer architecture classification
♦ Commodity processor with commodity inter- processor connection Clusters Pentium, Itanium, Opteron, Alpha GigE, Infiniband, Myrinet, Quadrics, SCI NEC.
Ver 0.1 Page 1 SGI Proprietary Introducing the CRAY SV1 CRAY SV1-128 SuperCluster.
SGI’2000Parallel Programming Tutorial Supercomputers 2 With the acknowledgement of Igor Zacharov and Wolfgang Mertz SGI European Headquarters.
Today’s topics Single processors and the Memory Hierarchy
Beowulf Supercomputer System Lee, Jung won CS843.
Zhao Lixing.  A supercomputer is a computer that is at the frontline of current processing capacity, particularly speed of calculation.  Supercomputers.
1 Parallel Scientific Computing: Algorithms and Tools Lecture #3 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.
CSCI-455/522 Introduction to High Performance Computing Lecture 2.
History of Distributed Systems Joseph Cordina
1 Threads, SMP, and Microkernels Chapter 4. 2 Process: Some Info. Motivation for threads! Two fundamental aspects of a “process”: Resource ownership Scheduling.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Dec 5, 2005 Topic: Intro to Multiprocessors and Thread-Level Parallelism.
Tuesday, September 12, 2006 Nothing is impossible for people who don't have to do it themselves. - Weiler.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
An Introduction to Princeton’s New Computing Resources: IBM Blue Gene, SGI Altix, and Dell Beowulf Cluster PICASso Mini-Course October 18, 2006 Curt Hillegas.
Multiprocessors ELEC 6200 Computer Architecture and Design Instructor: Dr. Agrawal Yu-Chun Chen 10/27/06.

Chapter 17 Parallel Processing.
Seminar on parallel computing Goal: provide environment for exploration of parallel computing Driven by participants Weekly hour for discussion, show &
Hitachi SR8000 Supercomputer LAPPEENRANTA UNIVERSITY OF TECHNOLOGY Department of Information Technology Introduction to Parallel Computing Group.
Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Machine.
Parallel Processing Group Members: PJ Kulick Jon Robb Brian Tobin.
CPE 731 Advanced Computer Architecture Multiprocessor Introduction
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Real Parallel Computers. Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra,
Parallel Architectures
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
1 Lecture 7: Part 2: Message Passing Multicomputers (Distributed Memory Machines)
1 Parallel computing and its recent topics. 2 Outline 1. Introduction of parallel processing (1)What is parallel processing (2)Classification of parallel.
KUAS.EE Parallel Computing at a Glance. KUAS.EE History Parallel Computing.
CENG 546 Dr. Esma Yıldırım. Copyright © 2012, Elsevier Inc. All rights reserved What is a computing cluster?  A computing cluster consists of.
Executing OpenMP Programs Mitesh Meswani. Presentation Outline Introduction to OpenMP Machine Architectures Shared Memory (SMP) Distributed Memory MPI.
CS668- Lecture 2 - Sept. 30 Today’s topics Parallel Architectures (Chapter 2) Memory Hierarchy Busses and Switched Networks Interconnection Network Topologies.
Lappeenranta University of Technology / JP CT30A7001 Concurrent and Parallel Computing Introduction to concurrent and parallel computing.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Company LOGO High Performance Processors Miguel J. González Blanco Miguel A. Padilla Puig Felix Rivera Rivas.
Edgar Gabriel Short Course: Advanced programming with MPI Edgar Gabriel Spring 2007.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Case Study in Computational Science & Engineering - Lecture 2 1 Parallel Architecture Models Shared Memory –Dual/Quad Pentium, Cray T90, IBM Power3 Node.
1 CMPE 511 HIGH PERFORMANCE COMPUTING CLUSTERS Dilek Demirel İşçi.
Computing Environment The computing environment rapidly evolving ‑ you need to know not only the methods, but also How and when to apply them, Which computers.
PARALLEL PROCESSOR- TAXONOMY. CH18 Parallel Processing {Multi-processor, Multi-computer} Multiple Processor Organizations Symmetric Multiprocessors Cache.
Parallel Computing.
CS591x -Cluster Computing and Parallel Programming
Outline Why this subject? What is High Performance Computing?
Lecture 3: Computer Architectures
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-2.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 May 2, 2006 Session 29.
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Background Computer System Architectures Computer System Software.
Parallel Computers Today Oak Ridge / Cray Jaguar > 1.75 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating.
BLUE GENE Sunitha M. Jenarius. What is Blue Gene A massively parallel supercomputer using tens of thousands of embedded PowerPC processors supporting.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
These slides are based on the book:
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Parallel Architecture
Multiprocessor Systems
Parallel computer architecture classification
CS 147 – Parallel Processing
Chapter 17 Parallel Processing
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
Part 2: Parallel Models (I)
Chapter 4 Multiprocessors
Types of Parallel Computers
Presentation transcript:

Commodity Computing Clusters - next generation supercomputers? Paweł Pisarczyk, ATM S. A.

Agenda Introduction Supercomputer classification Architecture and implementations Commodity clusters Processors Operating systems Summary

Supercomputer „A supercomputer is a device for turning compute- bound problems into I/O-bound problem” - Seymour Cray A supercomputer is a computer system that leads the world in terms of processing capacity, particularly speed of calculations, at the time of its introduction. source:

Supercomputer History (1) Manchester Mark I MIT Whirlwind IBM KFLOPS CDC MFLOPS CDC MFLOPS CDC Cyber 76

Supercomputer History (2) Cray MFLOPS Cray X-MP MFLOPS Cray Y-MP GFLOPS Fujitsu Numerical Wind Tunnel GFLOPS Intel ASCI Red TFLOPS IBM ASCI White, SP Power3 375 MHz TFLOPS NEC Earth Simulator - 35 TFLOPS

Supercomputer Classes (1) General-purpose supercomputers: –vector processing machines - the same operation carried out on a large amount of data simultaneously –tightly connected cluster computers (NUMA) - communication oriented architectures engineered from ground up, based on high speed interconnects and large number of processors –commodity clusters - collection of large number of commodity PCs (COTS) interconnected by high- bandwidth low-latency network

Supercomputer Classes (2) Special-purpose supercomputers - high performance computing devices with a hardware architecture dedicated to solve a single problem (equipped with custom ASICS or FPGA chips) Examples –Deep Blue –GRAPE for astrophysics

Flynn taxonomy (1) SISD - Single Instruction Single Data (DEC, Sun Microsystems, PC) SIMD - Single Instruction Multiple Data –computers with large number o processing units (i.e. ALUs) - CPP DAP Gamma II, Quadrics Apemille –vector processing machines - NEC SX6, IA32 MMX MISD - Multiple Instruction Single Data –theoretical model, no practical implementation

Flynn taxonomy (2) MIMD - Multiple Instruction Multiple Data –SM-MIMD - Shared Memory MIMD global address space SMP systems and ccNUMA systems –DM-MIMD - Distributed Memory MIMD many nodes with local address spaces high-bandwidth, low-latency communication common NUMA architectures (Non Uniform Memory Access) operating system have to be communication oriented (Mach project)

SM-MIMD implementations S-COMA - Simple Cache-Only Memory Architecture –common SMP systems ccNUMA - Cache Coherent NUMA –SGI Origin 3000 –SGI Altix 3000 –HP SuperDome

S-COMA (SMP) CPU 0 RAM L2 cache CPU 1CPU N

ccNUMA CPU 0 RAM 0 L2 cache CPU 1 L3 cache L2 cache CPU N-1 L2 cache CPU N L3 cache RAM K

ccNUMA implementation SGI Altix 3000 (ccNUMA) 64 Itanium 2 (IA64) processors C-brick modules with 2 CPUs and ASIC SHUB NUMAflex, NUMAlink interconnects (6.4 GB/s, 2.4 GB/s) Modified Linux kernel (2.6 NUMA support)

DM-MIMD implementations Massively parallel systems (NUMA) –communication oriented architecture –low-latency, high-bandwidth interconnects –topologies: hypercube, torus, tree –Butterfly networks, Omega networks, engineered from ground up communication

DM-MIMD implementations Commodity clusters –a cluster is a collection of connected, independent computers working in unison to solve a problem –COTS technology –nodes are interconnected by Ethernet LAN, Myrinet, QsNet ELAN etc. –computation can be performed by using popular programming toolkits and frameworks: OpenMP, MPI –clusters require dedicated management software

NUMA implementations Cray T3E-1350 Processor: Alpha MHz Number of CPUs: D Torus topology Operating system: UNICOS/mk - microkernel based Peak performance: 3 TFLOPS

Commodity cluster implementation (1) Linux Networx/Quadrics Processor: Intel Xeon 2.4 GHz CPUs: 2304 Interconnections: QsNet ELAN3 Operating system: Linux + management tools + Lustre Cluster File System Peak performance: 7.6 TFLOPS 3 rd computer on TOP500 list Developed for Lawrence Livermore National Laboratory in 2002

Commodity cluster implementation (2) HP XC6000 Cluster (XC3000 Cluster) Processor: Intel Itanium 2 6M 1.5 GHz (Intel Xeon 3 GHz) Node: HP Integrity rx2600 (HP ProLiant DL380) Number of processors: Interconnections: QsNet ELAN3 (Myricom Myrinet XP) Operating system: Linux + SSI Middleware + management tools + Lustre Cluster File System Peak performance: 34 CPUs GFLOPS, 512 CPUs - 3 TFLOPS

Commodity Clusters - software Operating system - Linux or SSI Linux (Single System Image) Platform for specialized applications for science, engineering and business (simulation, modeling, data mining) Distributed computation environments are used for software development (OpenMP, MPI) Common supercomputer applications require porting to clusters

Performance Scaling Scale-Out (Cluster) Scale-Up (SMP, ccNUMA) Scale Right

Processors (1) Many types of existing processors are used in supercomputers Microprocessor development directions: –Increasing of clock frequency and speed instruction stream processing –Processing of large collection of data in single processor instruction - SIMD –Control path multiplication – multithreading

Processors (2) Vector processors –NEC SX-6 –Cray (Cray X1) RISC processors –MIPS –IBM Power4 –Alpha CISC processors –IA32 –AMD x86-64 VLIW processors –IA64

Intel Itanium 2 features State-of-the-art unconventional 64-bit architecture New programming model implementing VLIW paradigm EPIC technology – Explicitly Parallel Instruction Computing – compiler determines instruction dependency informing processor how to process an instruction stream parallel Many registers ( bit), register stack management 6 GFLOPS peak performance Full advantages of the processor can be used by dedicated compiler

Operating systems Monolithic kernel based OSs - UNIX (modification of existing solutions) –BSD –Solaris –Irix –Linux Microkernel based OSs –Mach

Microkernel architecture Task ATask B Kernel Task C Kernel Hardware

Summary Today’s there is a lot of supercomputer architectures Both vector processors and common RISC, CISC, VLIW chips are used for supercomputers Commodity clusters under control of Linux OS are an attractive method for supercomputer implementation

TOP 500 list (1) 1. Earth Simulator, NEC TFLOPS 2. HP Alphaserver SC, HP TFLOPS 3. Linux Networx / Quadrics IA TFLOPS

Top 500 list (2) Source: