Department of Computer Science University of the West Indies.

Slides:



Advertisements
Similar presentations
Threads, SMP, and Microkernels
Advertisements

Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Distributed Systems CS
SE-292 High Performance Computing
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
CS 213: Parallel Processing Architectures Laxmi Narayan Bhuyan Lecture3.
1 Parallel Scientific Computing: Algorithms and Tools Lecture #3 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.
Parallel Processing: Architecture Overview Subject Code: Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS) Lab. The University of Melbourne.
Taxanomy of parallel machines. Taxonomy of parallel machines Memory – Shared mem. – Distributed mem. Control – SIMD – MIMD.
Parallel Computers Chapter 1
Multiprocessors CSE 4711 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor –Although.
Introduction to MIMD architectures
Tuesday, September 12, 2006 Nothing is impossible for people who don't have to do it themselves. - Weiler.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
Chapter 17 Parallel Processing.
Multiprocessors CSE 471 Aut 011 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor.
Parallel Processing Architectures Laxmi Narayan Bhuyan
Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Machine.
CPE 731 Advanced Computer Architecture Multiprocessor Introduction
Fall 2008Introduction to Parallel Processing1 Introduction to Parallel Processing.
Parallel Processing: Architecture Overview Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS) Lab. The University of Melbourne Melbourne, Australia.
MultiIntro.1 The Big Picture: Where are We Now? Processor Control Datapath Memory Input Output Input Output Memory Processor Control Datapath  Multiprocessor.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Introduction to Parallel Processing Ch. 12, Pg
Parallel Architectures
Introduction to Parallel Processing 3.1 Basic concepts 3.2 Types and levels of parallelism 3.3 Classification of parallel architecture 3.4 Basic parallel.
Computer Architecture Parallel Processing
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
KUAS.EE Parallel Computing at a Glance. KUAS.EE History Parallel Computing.
Computer System Architectures Computer System Software
Course Outline Introduction in software and applications. Parallel machines and architectures –Overview of parallel machines –Cluster computers (Myrinet)
Parallel Computing Basic Concepts Computational Models Synchronous vs. Asynchronous The Flynn Taxonomy Shared versus Distributed Memory Interconnection.
Introduction to Parallel Computing: Architectures, Systems, and Programming Prof. Rajkumar Buyya Cloud Computing and Distributed Systems (CLOUDS) Lab.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
CSC 364/664 Parallel Computation Fall 2003 Burg/Miller/Torgersen Chapter 1: Parallel Computers.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
CHAPTER 12 INTRODUCTION TO PARALLEL PROCESSING CS 147 Guy Wong page
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.
Introduction to Parallel Processing
PARALLEL PROCESSOR- TAXONOMY. CH18 Parallel Processing {Multi-processor, Multi-computer} Multiple Processor Organizations Symmetric Multiprocessors Cache.
Server HW CSIS 4490 n-Tier Client/Server Dr. Hoganson Server Hardware Mission-critical –High reliability –redundancy Massive storage (disk) –RAID for redundancy.
Outline Why this subject? What is High Performance Computing?
Lecture 3: Computer Architectures
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
1 Lecture 17: Multiprocessors Topics: multiprocessor intro and taxonomy, symmetric shared-memory multiprocessors (Sections )
Parallel Computing Presented by Justin Reschke
LECTURE #1 INTRODUCTON TO PARALLEL COMPUTING. 1.What is parallel computing? 2.Why we need parallel computing? 3.Why parallel computing is more difficult?
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Background Computer System Architectures Computer System Software.
1  2004 Morgan Kaufmann Publishers Fallacies and Pitfalls Fallacy: the rated mean time to failure of disks is 1,200,000 hours, so disks practically never.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Group Members Hamza Zahid (131391) Fahad Nadeem khan Abdual Hannan AIR UNIVERSITY MULTAN CAMPUS.
These slides are based on the book:
Parallel Processing: Architecture Overview
Parallel Computers Definition: “A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast.”
Course Outline Introduction in algorithms and applications
CS 147 – Parallel Processing
Flynn’s Classification Of Computer Architectures
Multi-Processing in High Performance Computer Architecture:
MIMD Multiple instruction, multiple data
Different Architectures
Chapter 17 Parallel Processing
Multiprocessors - Flynn’s taxonomy (1966)
Parallel Processing Architectures
Overview Parallel Processing Pipelining
AN INTRODUCTION ON PARALLEL PROCESSING
Chapter 4 Multiprocessors
Presentation transcript:

Department of Computer Science University of the West Indies

How did we learn to fly ? By constructing a machine that flaps its wings like a bird ? Answer By applying aerodynamics principles demonstrated by the nature... Likewise we model parallel processing after those of biological species. Computing Components

1.Aggregated speed with which complex calculations carried out by neurons 2.Individual response is slow (measured in ms). This demonstrates the feasibility of parallel processing. Motivating Factors

PPPPPP  Microkernel Multi-Processor Computing System Threads Interface Hardware Operating System Process Processor Thread P P Applications Programming paradigms Computing Components

Simple classification by Flynn: (No. of instruction and data streams) > SISD - conventional > SIMD - data parallel, vector computing > MISD - > MIMD - very general, multiple approaches. Current focus is on MIMD model, using general purpose processors. Processing Elements

SISD : A Conventional Computer Speed is limited by the rate at which computer can transfer information internally. Processor Data Input Data Output Instructions Examples: PC, Macintosh, Workstations

The MISD Architecture More of an intellectual exercise than a practical configuration. Few built, but commercially not available Data Input Stream Data Output Stream Processor A Processor B Processor C Instruction Stream A Instruction Stream B Instruction Stream C

SIMD Architecture Examples: CRAY machine vector processing, Thinking machine CM Intel MMX (multimedia support) Instruction Stream Processor A Processor B Processor C Data Input stream A Data Input stream B Data Input stream C Data Output stream A Data Output stream B Data Output stream C

Unlike SIMD, MIMD computer works asynchronously. Shared memory (tightly coupled) MIMD Distributed memory (loosely coupled) MIMD MIMD Architecture Processor A Processor B Processor C Data Input stream A Data Input stream B Data Input stream C Data Output stream A Data Output stream B Data Output stream C Instruction Stream A Instruction Stream B Instruction Stream C

MEMORYMEMORY BUSBUS Shared Memory MIMD machine Comm: Source PE writes data to GM & destination retrieves it  Easy to build, conventional OSes of SISD can be easily be ported  Limitation : reliability & expandibility. A memory component or any processor failure affects the whole system.  Increase of processors leads to scalability problems. Examples : Silicon graphics supercomputers.... MEMORYMEMORY BUSBUS Global Memory System Processor A Processor A Processor B Processor B Processor C Processor C MEMORYMEMORY BUSBUS

SMM Examples qDual and quad Pentiums qPower Mac G5s q Dual processor (2 GHz each)

Quad Pentium Shared Memory Multiprocessor Processor L1 cache L2 cache Bus interface Processor L1 cache L2 cache Bus interface Processor L1 cache L2 cache Bus interface Processor L1 cache L2 cache Bus interface Processor/ memory bus I/O interface Memory controller Memory I/O bus Shared memory

qAny memory location is accessible by any of the processors qA single address space exists, meaning that each memory location is given a unique address within a single range of addresses qGenerally shared memory programming is more convenient although it does require access to shared data to be controlled by the programmer qInter-process communication is done in the memory interface through reads and writes. qVirtual memory address maps to a real address.

Shared Memory Address Space qDifferent processors may have memory locally attached to them. qDifferent instances of memory access could take different amounts of time. Collisions are possible. qUMA (i.e., shared memory) vs. NUMA (i.e., distributed shared memory)

Building Shared Memory systems Building SMM machines with more than 4 processors is very difficult and very expensive e.g. Sun Microsystems E10000 “Starfire” server q 64 processors q Price: $US several million

MEMORYMEMORY BUSBUS Distributed Memory MIMD lCommunication : IPC on High Speed Network. lNetwork can be configured to... Tree, Mesh, Cube, etc. lUnlike Shared MIMD  easily/ readily expandable  Highly reliable (any CPU failure does not affect the whole system) Processor A Processor A Processor B Processor B Processor C Processor C MEMORYMEMORY BUSBUS MEMORYMEMORY BUSBUS Memory System A Memory System A Memory System B Memory System B Memory System C Memory System C IPC channel IPC channel

Distributed Memory Decentralized memory (memory module with CPU) Lower memory latency Drawbacks Longer communication latency Software model more complex

Decentralized Memory versions Message passing "multi-computer" with separate address space per processor Can invoke software with Remote Procedure Call (RPC) Often via library, such as MPI: Message Passing Interface Also called “synchronous communication" since communication causes synchronization between 2 processes

Message Passing System qInter-process communication is done at the program level using sends and receives. qReads and writes refer only to a processor’s local memory. qData can be packed into long messages before being sent, to compensate for latency. qGlobal scheduling of messages can help avoid message collisions.

MIMD program structure Multiple Program Multiple Data (MPMD) Each processor will have its own program to execute Single Program Multiple Data (SPMD) A single source program is written, and each processor executes its own personal copy of the program

Speedup factor S(n) = Execution time on a single processor Execution time on a multiprocessor with n processors S(n) gives increase in speed by using a multiprocessor Speedup factor can also be cast in terms of computational steps S(n) = Number of steps using one processor Number of parallel steps using n processors Maximum speedup is n with n processors (linear speedup) - this theoretical limit is not always achieved

Maximum Speedup - Amdahl’s Law Serial sectionParallelizable sections tsts ft s (1-f)t s One processor Multiple processors (1-f)t s /n tptp S(n) = n 1 + f(n-1)

Parallel Architectures Function-parallel architectures Instruction level PAs Thread level PAs Process level PAs (MIMDs) Distributed Memory MIMD Shared Memory MIMD Data-parallel architectures