CENG 546 Dr. Esma Yıldırım. Copyright © 2012, Elsevier Inc. All rights reserved. 2 - 2 What is a computing cluster?  A computing cluster consists of.

Slides:



Advertisements
Similar presentations
Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Advertisements

Distributed Systems CS
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
Commodity Computing Clusters - next generation supercomputers? Paweł Pisarczyk, ATM S. A.
2. Computer Clusters for Scalable Parallel Computing
Today’s topics Single processors and the Memory Hierarchy
Computer Organization and Architecture
CSCI-455/522 Introduction to High Performance Computing Lecture 2.
CSCI 8150 Advanced Computer Architecture Hwang, Chapter 1 Parallel Computer Models 1.2 Multiprocessors and Multicomputers.
Background Computer System Architectures Computer System Software.
1 MIMD Computers Module 4. 2 PMS Notation (Bell & Newell, 1987) Similar to a block notation, except using single letters Can augment letter with ( ) containing.
Distributed Processing, Client/Server, and Clusters
IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
1 Multiprocessors. 2 Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) bad.

Chapter 17 Parallel Processing.
Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.
1 Lecture 23: Multiprocessors Today’s topics:  RAID  Multiprocessor taxonomy  Snooping-based cache coherence protocol.
Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Machine.
IBM RS/6000 SP POWER3 SMP Jari Jokinen Pekka Laurila.
Parallel Computer Architectures
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.
Chapter 18 Parallel Processing (Multiprocessing).
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
Chapter 2 Computer Clusters Lecture 2.1 Overview.
Parallel Architectures
1 Lecture 7: Part 2: Message Passing Multicomputers (Distributed Memory Machines)
Computer System Architectures Computer System Software
CLUSTER COMPUTING STIMI K.O. ROLL NO:53 MCA B-5. INTRODUCTION  A computer cluster is a group of tightly coupled computers that work together closely.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Outline Course Administration Parallel Archtectures –Overview –Details Applications Special Approaches Our Class Computer Four Bad Parallel Algorithms.
Edgar Gabriel Short Course: Advanced programming with MPI Edgar Gabriel Spring 2007.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.
Master Program (Laurea Magistrale) in Computer Science and Networking High Performance Computing Systems and Enabling Platforms Marco Vanneschi 1. Prerequisites.
1 CMPE 511 HIGH PERFORMANCE COMPUTING CLUSTERS Dilek Demirel İşçi.
Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.
Copyright © 2011 Curt Hill MIMD Multiple Instructions Multiple Data.
Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Multiprocessors.
PARALLEL PROCESSOR- TAXONOMY. CH18 Parallel Processing {Multi-processor, Multi-computer} Multiple Processor Organizations Symmetric Multiprocessors Cache.
CS591x -Cluster Computing and Parallel Programming
+ Clusters Alternative to SMP as an approach to providing high performance and high availability Particularly attractive for server applications Defined.
Interconnection network network interface and a case study.
Outline Why this subject? What is High Performance Computing?
Lecture 3: Computer Architectures
1 Lecture 1: Parallel Architecture Intro Course organization:  ~18 parallel architecture lectures (based on text)  ~10 (recent) paper presentations 
Computer Organization CS224 Fall 2012 Lesson 52. Introduction  Goal: connecting multiple computers to get higher performance l Multiprocessors l Scalability,
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-2.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 May 2, 2006 Session 29.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 3.
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Background Computer System Architectures Computer System Software.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
SYSTEM MODELS FOR ADVANCED COMPUTING Jhashuva. U 1 Asst. Prof CSE
The University of Adelaide, School of Computer Science
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
These slides are based on the book:
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
Overview Parallel Processing Pipelining
CS5102 High Performance Computer Systems Thread-Level Parallelism
Distributed Processors
Lecture 1: Parallel Architecture Intro
Chapter 17 Parallel Processing
Multiple Processor Systems
Presentation transcript:

CENG 546 Dr. Esma Yıldırım

Copyright © 2012, Elsevier Inc. All rights reserved What is a computing cluster?  A computing cluster consists of a collection of interconnected stand-alone/complete computers, which can cooperatively working together as a single, integrated computing resource. Cluster explores parallelism at job level and distributed computing with higher availability.  A typical cluster:  Merging multiple system images to a SSI (single-system image ) at certain functional levels.  Low latency communication protocols applied  Loosely coupled than an SMP with a SSI

3  It is a distributed/parallel computing system  It is constructed entirely from commodity subsystems  All subcomponents can be acquired commercially and separately  Computing elements (nodes) are employed as fully operational standalone mainstream systems  Two major subsystems:  Compute nodes  System area network (SAN)  Employs industry standard interfaces for integration  Uses industry standard software for majority of services  Incorporates additional middleware for interoperability among elements  Uses software for coordinated programming of elements in parallel

Copyright © 2012, Elsevier Inc. All rights reserved Multicomputer Clusters:  Cluster: A network of computers supported by middleware and interacting by message passing  PC Cluster (Most Linux clusters)  Workstation Cluster (NOW, COW)  Server cluster or Server Farm  Cluster of SMPs or ccNUMA systems  Cluster-structured massively parallel processors (MPP) – about 85% of the top-500 systems

Copyright © 2012, Elsevier Inc. All rights reserved

Copyright © 2012, Elsevier Inc. All rights reserved  System availability (HA) : Cluster offers inherent high system availability due to the redundancy of hardware, operating systems, and applications.  Hardware Fault Tolerance: Cluster has some degree of redundancy in most system components including both hardware and software modules.  OS and application reliability : Run multiple copies of the OS and applications, and through this redundancy  Scalability : Adding servers to a cluster or adding more clusters to a network as the application need arises.  High Performance : Running cluster enabled programs to yield higher throughput.

7  The ability to deliver proportionally greater sustained performance through increased system resources  Strong Scaling  Fixed size application problem  Application size remains constant with increase in system size  Weak Scaling  Variable size application problem  Application size scales proportionally with system size  Capability computing  in most pure form: strong scaling  Marketing claims tend toward this class  Capacity computing  Throughput computing  Includes job-stream workloads  In most simple form: weak scaling  Cooperative computing  Interacting and coordinating concurrent processes  Not a widely used term  Also: “coordinated computing”

8  Peak floating point operations per second (flops)  Peak instructions per second (ips)  Sustained throughput  Average performance over a period of time  flops, Mflops, Gflops, Tflops, Pflops  flops, Megaflops, Gigaflops, Teraflops, Petaflops  ips, Mips, ops, Mops …  Cycles per instruction  cpi  Alternatively: instructions per cycle, ipc  Memory access latency  cycles per second  Memory access bandwidth  bytes per second (Bps)  bits per second (bps)  or Gigabytes per second, GBps, GB/s

 I/O Interface  Memory Interface  Cache hierarchy  Register Sets  Control  Execution pipeline  Arithmetic Logic Units 9

10  A general class of system  Integrates multiple processors in to an interconnected ensemble  MIMD: Multiple Instruction Stream Multiple Data Stream  Different memory models  Distributed memory  Nodes support separate address spaces  Shared memory  Symmetric multiprocessor  UMA – uniform memory access  Cache coherent  Distributed shared memory  NUMA – non uniform memory access  Cache coherent  PGAS  Partitioned global address space  NUMA  Not cache coherence  Hybrid : Ensemble of distributed shared memory nodes  Massively Parallel Processor, MPP

11  MPP  General class of large scale multiprocessor  Represents largest systems  IBM BG/L  Cray XT3  Distinguished by memory strategy  Distributed memory  Distributed shared memory  Cache coherent  Partitioned global address space  Custom interconnect network  Potentially heterogeneous  May incorporate accelerator to boost peak performance

12

13

Copyright © 2012, Elsevier Inc. All rights reserved IBM BlueGene/L Supercomputer: The World Fastest Message-Passing MPP built in 2005 Built jointly by IBM and LLNL teams and funded by US DoE ASCI Research Program

15  Building block for large MPP  Multiple processors  2 to 32 processors  Now Multicore  Uniform Memory Access (UMA) shared memory  Every processor has equal access in equal time to all banks of the main memory  Cache coherent  Multiple copies of variable maintained consistent by hardware

16

17 USB Peripherals JTAG MP L1 L2 MP L1 L2 L3 MP L1 L2 MP L1 L2 L3 M1 M n-1 Controller S S NIC Legend : MP : MicroProcessor L1,L2,L3 : Caches M1.. : Memory Banks S : Storage NIC : Network Interface Card Ethernet PCI-e

18 Distributed Shared Memory- Non-uniform memory access

19 16X System Area Network 64 Processor Constellation 64 Processor Commodity Cluster 4X System Area Network An ensemble of N nodes each comprising p computing elements The p elements are tightly bound shared memory (e.g., smp, dsm) The N nodes are loosely coupled, i.e., distributed memory p is greater than N Distinction is which layer gives us the most power through parallelism

20 Science Problems : Environmental Modeling, Physics, Computational Chemistry, etc. Application : Coastal Modeling, Black hole simulations, etc. Algorithms : PDE, Gaussian Elimination, 12 Dwarves, etc. Program Source Code Programming Languages: Fortran, C, C++, UPC, Fortress, X10, etc. Compilers : Intel C/C++/Fortran Compilers, PGI C/C++/Fortran, IBM XLC, XLC++, XLF, etc. Runtime Systems : Java Runtime, MPI etc. Operating Systems : Linux, Unix, AIX etc. Systems Architecture : Vector, SIMD array, MPP, Commodity Cluster Firmware : Motherboard chipset, BIOS, NIC drivers, Microarchitectures : Intel/AMD x86, SUN SPARC, IBM Power 5/6 Logic Design : RTL Circuit Design : ASIC, FPGA, Custom VLSI Device Technology : NMOS, CMOS, TTL, Optical Model of Computation

21

22