Chapter 6 Multiprocessor System. Introduction  Each processor in a multiprocessor system can be executing a different instruction at any time.  The.

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

© 2009 Fakultas Teknologi Informasi Universitas Budi Luhur Jl. Ciledug Raya Petukangan Utara Jakarta Selatan Website:
The University of Adelaide, School of Computer Science
Distributed Systems CS
SE-292 High Performance Computing
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Super computers Parallel Processing By: Lecturer \ Aisha Dawood.
Classification of Distributed Systems Properties of Distributed Systems n motivation: advantages of distributed systems n classification l architecture.
Multiple Processor Systems
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
CSCI 8150 Advanced Computer Architecture Hwang, Chapter 1 Parallel Computer Models 1.2 Multiprocessors and Multicomputers.
Multiprocessors CSE 4711 Multiprocessors - Flynn’s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) –Conventional uniprocessor –Although.
The University of Adelaide, School of Computer Science
1 Multiprocessors. 2 Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) bad.
1 Lecture 1: Parallel Architecture Intro Course organization:  ~5 lectures based on Culler-Singh textbook  ~5 lectures based on Larus-Rajwar textbook.
Chapter 17 Parallel Processing.
Lecture 10 Outline Material from Chapter 2 Interconnection networks Processor arrays Multiprocessors Multicomputers Flynn’s taxonomy.
1 Lecture 23: Multiprocessors Today’s topics:  RAID  Multiprocessor taxonomy  Snooping-based cache coherence protocol.
1 CSE SUNY New Paltz Chapter Nine Multiprocessors.
1 Lecture 20: Protocols and Synchronization Topics: distributed shared-memory multiprocessors, synchronization (Sections )
Chapter 18 Parallel Processing (Multiprocessing).
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
Multiprocessor Cache Coherency
Computer System Architectures Computer System Software
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
Parallel Computing Basic Concepts Computational Models Synchronous vs. Asynchronous The Flynn Taxonomy Shared versus Distributed Memory Interconnection.
MIMD Shared Memory Multiprocessors. MIMD -- Shared Memory u Each processor has a full CPU u Each processors runs its own code –can be the same program.
August 15, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 12: Multiprocessors: Non-Uniform Memory Access * Jeremy R. Johnson.
Parallel Computer Architecture and Interconnect 1b.1.
CHAPTER 12 INTRODUCTION TO PARALLEL PROCESSING CS 147 Guy Wong page
Course Wrap-Up Miodrag Bolic CEG4136. What was covered Interconnection network topologies and performance Shared-memory architectures Message passing.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
Lecture 13: Multiprocessors Kai Bu
Computer System Architecture Dept. of Info. Of Computer. Chap. 13 Multiprocessors 13-1 Chap. 13 Multiprocessors n 13-1 Characteristics of Multiprocessors.
PARALLEL PROCESSOR- TAXONOMY. CH18 Parallel Processing {Multi-processor, Multi-computer} Multiple Processor Organizations Symmetric Multiprocessors Cache.
Parallel Programming Sathish S. Vadhiyar. 2 Motivations of Parallel Computing Parallel Machine: a computer system with more than one processor Motivations.
DISTRIBUTED COMPUTING
Lec 6 Chap. 13Multiprocessors
Outline Why this subject? What is High Performance Computing?
1 Lecture 19: Scalable Protocols & Synch Topics: coherence protocols for distributed shared-memory multiprocessors and synchronization (Sections )
Super computers Parallel Processing
Multiprocessor  Use large number of processor design for workstation or PC market  Has an efficient medium for communication among the processor memory.
CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.
COMP8330/7330/7336 Advanced Parallel and Distributed Computing Tree-Based Networks Cache Coherence Dr. Xiao Qin Auburn University
The University of Adelaide, School of Computer Science
Multi Processing prepared and instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University June 2016Multi Processing1.
EE 382 Processor DesignWinter 98/99Michael Flynn 1 EE382 Processor Design Winter 1998 Chapter 8 Lectures Multiprocessors, Part I.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Lecture 13: Multiprocessors Kai Bu
Overview Parallel Processing Pipelining
Parallel Architecture
CS5102 High Performance Computer Systems Thread-Level Parallelism
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Lecture 18: Coherence and Synchronization
Overview Parallel Processing Pipelining
The University of Adelaide, School of Computer Science
CMSC 611: Advanced Computer Architecture
The University of Adelaide, School of Computer Science
Parallel and Multiprocessor Architectures – Shared Memory
Multiprocessor Introduction and Characteristics of Multiprocessor
Chapter 17 Parallel Processing
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 19: Coherence and Synchronization
Lecture 18: Coherence and Synchronization
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Presentation transcript:

Chapter 6 Multiprocessor System

Introduction  Each processor in a multiprocessor system can be executing a different instruction at any time.  The major advantages of MIMD system –Reliability –High performance  The overhead involved with MIMD –Communication between processors –Synchronization of the work –Waste of processor time if any processor runs out of work to do –Processor scheduling

Introduction (continued)  task –An entity to which a processor is assigned –a program, a function or a procedure in execution  process –another word for a task  processor (or processing element) –hardware resource on which tasks are executed

Introduction (continued)  Thread –The sequence of tasks performed in succession by a given processor –The path of execution of a processor through a number of tasks. –Multiprocessors provide for the simultaneous presence of a number of threads of execution in an application. –Refer to Example 6.1 (degree of parallelism =3)

R-to-C ratio  A measure of how much overhead is produced per unit of computation. –R: the length of the run time of the task (=computation time) –C: the communication overhead  This ratio signifies task granularity  A high R-to-C ratio implies that communication overhead is insignificant compared to computation time.

Task granularity  Task granularity –Coarse grain parallelism  High R-to-C ratio –Fine grain parallelism  Low R-to-C ratio –The general tendency to maximum performance is to resort to the finest possible granularity.  providing for the highest degree of parallelism. –Maximum parallelism does not lead to maximum overhead.  a trade-off is required to reach an optimum level.

6.1 MIMD Organization (Figure 6.2)  Two popular MIMD organizations –Shared memory (or tightly coupled ) architecture –Message passing (or loosely coupled) architecture  Share memory architecture –UMA (uniform memory architecture) –Rapid memory access –Memory contention

6.1 MIMD Organization (continued)  Message-passing architecture –Distributed memory MIMD system –NUMA (nonuniform memory access) –Heavy communication overhead for remote memory access –No memory contention problem  Other models –Mixed of two

6.2 Memory Organization  Two parameters of interest in MIMD memory system design – bandwidth – latency.  Memory latency is reduced by increasing the memory bandwidth. –By building the memory system with multiple independent memory modules (Banked and interleaved memory architecture) –By reducing the memory access and cycle times

Multi-port memories  Figure 6.3 (b) –Each memory module is a three-port memory device. –All three ports can be active simultaneously. –The only restriction is that only one location can be write data into a memory location.

Cache incoherence  The problem wherein the value of a data item is not consistent throughout the memory system. –Write-through  A processor updates the cache and also the corresponding entry in the main memory. –Updating protocol –Invalidating protocol – Write-back  An updated cache-block is written back to the main memory just before that block is replaced in the cache.

6.2 Memory Organization (continued)  Cache coherence schemes –Not to use private caches (Figure 6.4) –With private cache architecture, but to cache only non-sharable data items. –Cache flushing  Shared data are allowed to be cached only when it is known that only one processor will be accessing the data

6.2 Memory Organization (continued)  Cache coherence schemes (continued) –Bus watching (or bus snooping) (Figure 6.5)  Bus watching schemes incorporate hardware that monitors the shared bus for data LOAD and STORE into each processor ’ s cache controller. –Write-once  The first STORE causes a write-through to the main memory.  Ownership protocol

6.3 Interconnection Network  Bus (Figure 6.6) –Bus window (Figure 6.7(a)) –Fat tree (Figure 6.7 (b))  Loop or ring –token ring standard  Mesh

6.3 Interconnection Network(continued)  Hypercube –Routing is straightforward. –The number of nodes must be increased by powers of two.  Crossbar –It offers multiple simultaneous communications but at a high hardware complexity.  Multistage switching networks

6.4 Operating System Considerations  The major functions of the multiprocessor system –Keeping track of the status of all the resources at all time –Assigning tasks to processors in a justifiable manner –Spawning and creating new processors such that they can be executed in parallel or independently of each other. –Collecting their individual results when all the spawned processed are completed and passing them to other processors as required.

6.4 Operating System Considerations (continued)  Synchronization mechanisms –Processes in an MIMD operate in a cooperative manner and a sequence control mechanism is needed to ensure the ordering of operations. –Processes compete with each other to gain access to shared data items. –An access control mechanism is needed to maintain orderly access

6.4 Operating System Considerations (continued)  Synchronization mechanisms –The most primitive synchronization techniques  Test & set  Semaphores  Barrier synchronization  Fetch & add  Heavy-weight process and Light-weight process  Scheduling –Static –Dynamic : load balancing

6.5 Programming (continued)  Four main structures of parallel programming –Parbegin / parend –Fork / join –Doall –Processes, tasks, procedures, and so on can be declared for parallel execution.

6.6 Performance Evaluation and Scalability  Performance evaluation –Speed-up : S = Ts / Tp To= TpP-Ts  Tp=(To+Ts)/P S = Ts P/(To+Ts) –Efficiency : E = S/p = Ts/(Ts+To) = 1/(1+To/Ts)

Scalability  Scalability: the ability to increase speedup as the number of processors increase.  A parallel system is scalable if its efficiency can be maintained at a fixed value by increasing the number of processors as the problem size increases. –Time-constrained scaling –Memory-constrained scaling

Isoefficiency function  E = 1/(1+To/Ts)  To/Ts=(1-E)/E. Hence, Ts=ETo/(1-E) For a given value of E, E/(1-E) is a constant, K. Then Ts=KTo (Isoefficency function)  A small isoeffiency function indicates that small increments in problem size are sufficient to maintain efficiency when p is increased.

6.6 Performance Evaluation and Scalability (continued)  Performance models –The basic model  Each task is equal and takes R time units to be executed on a processor.  If two tasks on different processors wish to communicate with each other, they do so at a cost C time units. –Model with linear communication overhead –Model with overlapped communication –Stochastic model

Examples  Alliant FX series –Figure 6.17 –Parallelism  Instruction level  Loop level  Task level