Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 6 Multiprocessor System. Introduction  Each processor in a multiprocessor system can be executing a different instruction at any time.  The.

Similar presentations


Presentation on theme: "Chapter 6 Multiprocessor System. Introduction  Each processor in a multiprocessor system can be executing a different instruction at any time.  The."— Presentation transcript:

1 Chapter 6 Multiprocessor System

2 Introduction  Each processor in a multiprocessor system can be executing a different instruction at any time.  The major advantages of MIMD system –Reliability –High performance  The overhead involved with MIMD –Communication between processors –Synchronization of the work –Waste of processor time if any processor runs out of work to do –Processor scheduling

3 Introduction (continued)  task –An entity to which a processor is assigned –a program, a function or a procedure in execution  process –another word for a task  processor (or processing element) –hardware resource on which tasks are executed

4 Introduction (continued)  Thread –The sequence of tasks performed in succession by a given processor –The path of execution of a processor through a number of tasks. –Multiprocessors provide for the simultaneous presence of a number of threads of execution in an application. –Refer to Example 6.1 (degree of parallelism =3)

5 R-to-C ratio  A measure of how much overhead is produced per unit of computation. –R: the length of the run time of the task (=computation time) –C: the communication overhead  This ratio signifies task granularity  A high R-to-C ratio implies that communication overhead is insignificant compared to computation time.

6 Task granularity  Task granularity –Coarse grain parallelism  High R-to-C ratio –Fine grain parallelism  Low R-to-C ratio –The general tendency to maximum performance is to resort to the finest possible granularity.  providing for the highest degree of parallelism. –Maximum parallelism does not lead to maximum overhead.  a trade-off is required to reach an optimum level.

7 6.1 MIMD Organization (Figure 6.2)  Two popular MIMD organizations –Shared memory (or tightly coupled ) architecture –Message passing (or loosely coupled) architecture  Share memory architecture –UMA (uniform memory architecture) –Rapid memory access –Memory contention

8 6.1 MIMD Organization (continued)  Message-passing architecture –Distributed memory MIMD system –NUMA (nonuniform memory access) –Heavy communication overhead for remote memory access –No memory contention problem  Other models –Mixed of two

9 6.2 Memory Organization  Two parameters of interest in MIMD memory system design – bandwidth – latency.  Memory latency is reduced by increasing the memory bandwidth. –By building the memory system with multiple independent memory modules (Banked and interleaved memory architecture) –By reducing the memory access and cycle times

10 Multi-port memories  Figure 6.3 (b) –Each memory module is a three-port memory device. –All three ports can be active simultaneously. –The only restriction is that only one location can be write data into a memory location.

11 Cache incoherence  The problem wherein the value of a data item is not consistent throughout the memory system. –Write-through  A processor updates the cache and also the corresponding entry in the main memory. –Updating protocol –Invalidating protocol – Write-back  An updated cache-block is written back to the main memory just before that block is replaced in the cache.

12 6.2 Memory Organization (continued)  Cache coherence schemes –Not to use private caches (Figure 6.4) –With private cache architecture, but to cache only non-sharable data items. –Cache flushing  Shared data are allowed to be cached only when it is known that only one processor will be accessing the data

13 6.2 Memory Organization (continued)  Cache coherence schemes (continued) –Bus watching (or bus snooping) (Figure 6.5)  Bus watching schemes incorporate hardware that monitors the shared bus for data LOAD and STORE into each processor ’ s cache controller. –Write-once  The first STORE causes a write-through to the main memory.  Ownership protocol

14 6.3 Interconnection Network  Bus (Figure 6.6) –Bus window (Figure 6.7(a)) –Fat tree (Figure 6.7 (b))  Loop or ring –token ring standard  Mesh

15 6.3 Interconnection Network(continued)  Hypercube –Routing is straightforward. –The number of nodes must be increased by powers of two.  Crossbar –It offers multiple simultaneous communications but at a high hardware complexity.  Multistage switching networks

16 6.4 Operating System Considerations  The major functions of the multiprocessor system –Keeping track of the status of all the resources at all time –Assigning tasks to processors in a justifiable manner –Spawning and creating new processors such that they can be executed in parallel or independently of each other. –Collecting their individual results when all the spawned processed are completed and passing them to other processors as required.

17 6.4 Operating System Considerations (continued)  Synchronization mechanisms –Processes in an MIMD operate in a cooperative manner and a sequence control mechanism is needed to ensure the ordering of operations. –Processes compete with each other to gain access to shared data items. –An access control mechanism is needed to maintain orderly access

18 6.4 Operating System Considerations (continued)  Synchronization mechanisms –The most primitive synchronization techniques  Test & set  Semaphores  Barrier synchronization  Fetch & add  Heavy-weight process and Light-weight process  Scheduling –Static –Dynamic : load balancing

19 6.5 Programming (continued)  Four main structures of parallel programming –Parbegin / parend –Fork / join –Doall –Processes, tasks, procedures, and so on can be declared for parallel execution.

20 6.6 Performance Evaluation and Scalability  Performance evaluation –Speed-up : S = Ts / Tp To= TpP-Ts  Tp=(To+Ts)/P S = Ts P/(To+Ts) –Efficiency : E = S/p = Ts/(Ts+To) = 1/(1+To/Ts)

21 Scalability  Scalability: the ability to increase speedup as the number of processors increase.  A parallel system is scalable if its efficiency can be maintained at a fixed value by increasing the number of processors as the problem size increases. –Time-constrained scaling –Memory-constrained scaling

22 Isoefficiency function  E = 1/(1+To/Ts)  To/Ts=(1-E)/E. Hence, Ts=ETo/(1-E) For a given value of E, E/(1-E) is a constant, K. Then Ts=KTo (Isoefficency function)  A small isoeffiency function indicates that small increments in problem size are sufficient to maintain efficiency when p is increased.

23 6.6 Performance Evaluation and Scalability (continued)  Performance models –The basic model  Each task is equal and takes R time units to be executed on a processor.  If two tasks on different processors wish to communicate with each other, they do so at a cost C time units. –Model with linear communication overhead –Model with overlapped communication –Stochastic model

24 Examples  Alliant FX series –Figure 6.17 –Parallelism  Instruction level  Loop level  Task level


Download ppt "Chapter 6 Multiprocessor System. Introduction  Each processor in a multiprocessor system can be executing a different instruction at any time.  The."

Similar presentations


Ads by Google