Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSL718 : Multiprocessors 13th April, 2006 Introduction

Similar presentations


Presentation on theme: "CSL718 : Multiprocessors 13th April, 2006 Introduction"— Presentation transcript:

1 CSL718 : Multiprocessors 13th April, 2006 Introduction
Anshul Kumar, CSE IITD

2 Parallel Architectures
Flynn’s Classification [1966] Architecture Categories SISD SIMD MISD MIMD Anshul Kumar, CSE IITD

3 MIMD IS C P M IS DS IS C IS P DS Anshul Kumar, CSE IITD

4 Parallel Architectures
Sima’s Classification Parallel architectures PAs Data-parallel architectures Function-parallel Anshul Kumar, CSE IITD

5 Function Parallel Architectures
Instruction level PAs Thread level PAs Process level PAs ILPs Pipelined processors VLIWs Superscalar processors MIMDs Shared Memory MIMD Distributed Memory MIMD Built using general purpose processors Anshul Kumar, CSE IITD

6 Issues from user’s perspective
Specification / Program design explicit parallelism or implicit parallelism + parallelizing compiler Partitioning / mapping to processors Scheduling / mapping to time instants static or dynamic Communication and Synchronization Anshul Kumar, CSE IITD

7 Parallelizing example
for (i=0; i<n; i++) { m = m+3 a[i] = (a[m]+a[m+1]+a[m+2])/3 } Can all iterations be done in parallel? Dependence 1: m = m + 3 Dependence 2: a[1] = (a[3]+a[4]+a[5])/3 a[4] = (a[12]+a[13]+a[14])/3 Anshul Kumar, CSE IITD

8 Parallelizing example - contd.
Eliminate dependence based on induction variable for (i=0; i<n; i++) { m = i*3 a[i] = (a[m]+a[m+1]+a[m+2])/3 } Anshul Kumar, CSE IITD

9 Parallelizing example - contd.
Eliminate forward dependency using double buffer for (i=0; i<n; i++) { m = i*3 aa[i] = (a[m]+a[m+1]+a[m+2])/3 } barrier( ) a[i] = aa[i] Anshul Kumar, CSE IITD

10 Parallelizing example - contd.
Parallelization using dynamic thread creation and scheduling schedule(0) for (i=0; i<n; i++) { wait_till_scheduled(i) m = i*3 a[i] = (a[m]+a[m+1]+a[m+2])/3 if (i0)schedule(3*i) schedule(3*i+1) schedule(3*i+2) } Anshul Kumar, CSE IITD

11 Grain size and performance
Overhead limited load imbalance and parallelism limited Speed up Fine grain Opt grain size Coarse grain Anshul Kumar, CSE IITD

12 Speed up and efficiency
Anshul Kumar, CSE IITD

13 Amdahl’s Law Sp s 1 .5 Sp=p Sp=1

14 Generalization Sp p actual Anshul Kumar, CSE IITD

15 Shared Memory Architecture
Anshul Kumar, CSE IITD

16 Design Space of Shared Memory Architectures
Extent of address space sharing Location of memory modules Uniformity of memory access Anshul Kumar, CSE IITD

17 Address Space P1 P2 P3 P4 Each processor sees an
exclusive address space Each processor sees partly exclusive and partly shared address space Each processor sees same shared address space Anshul Kumar, CSE IITD

18 Location of Memory P M M Centralized P M P Mixed Distributed
Interconnection Network Centralized P M Interconnection Network Mixed P M Interconnection Network Distributed Anshul Kumar, CSE IITD

19 Clustered Architecture
M M M M M M M M P P P P P P P P Interconnection Network Interconnection Network M M M M M M Global Interconnection Network M M M Anshul Kumar, CSE IITD

20 Uniformity of Access UMA (Uniform Memory Access)
Uniformity across memory address space Uniformity across processors NUMA (Non-Uniform Memory Access) CC-NUMA (Cache Coherent NUMA) COMA (Cache Only Memory Architecture) UMA : Symmetrical Shared Memory Multiprocessor (SMP) NUMA : Distributed Shared Memory Multiprocessor Anshul Kumar, CSE IITD

21 Location and Sharing SHARING full partial none UMA centralized
mixed NUMA distributed Anshul Kumar, CSE IITD

22 Shared Memory with Caches
Multiple copies of data may exist  Problem of cache coherence Cache coherence protocols What action is taken? Which processors/caches communicate? Status of each block? Anshul Kumar, CSE IITD

23 What action is taken? Invalidate other caches and/or memory
send a signal/message immediately, copy information only when unavoidable similar to write back policy Update other caches and/or memory write simultaneously at all places (send modifications immediately) similar to write through policy Anshul Kumar, CSE IITD

24 Which procs/caches communicate?
Snoopy protocol broadcast invalidate or update messages all processors snoop on the bus Directory based protocol maintain directory - list of copies communicate selectively directory - centralized (memory) or distributed (caches) Anshul Kumar, CSE IITD

25 Status of each cache block?
valid/invalid private/shared clean/dirty Simplest protocol (3 states) Invalid, (shared) clean, private dirty Berkeley protocol (4 states) Invalid, (shared) clean, private dirty, shared dirty Illinois, Firefly protocols (4 states) Invalid, shared clean, private clean, private dirty Dragon protocols (5 states) Invalid, shared clean/dirty private clean/dirty Anshul Kumar, CSE IITD

26 Simplest invalidation protocol
Use 3 states : Invalid, shared clean, private dirty invalid clean shared? dirty CPU event BUS event Anshul Kumar, CSE IITD

27 Simplest invalidation protocol
Use 3 states : Invalid, shared clean, private dirty RD miss invalid clean shared? WR RD miss WR miss dirty CPU event BUS event Anshul Kumar, CSE IITD

28 Simplest invalidation protocol
Use 3 states : Invalid, shared clean, private dirty invalid clean shared? WR miss, INV RD miss WR miss, INV dirty CPU event BUS event Anshul Kumar, CSE IITD

29 Simplest invalidation protocol
Use 3 states : Invalid, shared clean, private dirty RD miss invalid clean shared? WR miss, INV RD miss WR miss, INV WR RD miss WR miss dirty CPU event BUS event Anshul Kumar, CSE IITD


Download ppt "CSL718 : Multiprocessors 13th April, 2006 Introduction"

Similar presentations


Ads by Google