Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mattan Erez The University of Texas at Austin

Similar presentations


Presentation on theme: "Mattan Erez The University of Texas at Austin"— Presentation transcript:

1 Mattan Erez The University of Texas at Austin
EE382N (20): Computer Architecture - Parallelism and Locality Fall 2011 Lectures 8 & 9 – Parallelism in Hardware Mattan Erez The University of Texas at Austin EE382N: Parallelilsm and Locality, Fall Lectures 8 & 9 (c) Mattan Erez

2 Simplified view of a pipelined processor
dispatch (issue) reg. access execute write-back commit fetch decode sequencer memory hierarchy EE382N: Parallelilsm and Locality, Fall Lectures 8 & 9 (c) Mattan Erez

3 Simplified view of a pipelined processor
dispatch (issue) reg. access execute write-back commit fetch decode sequencer 1:add r4, r1, r2 2:add r5, r1, r3 3:add r6, r2, r3 1 F D I R E W C 2 3 EE382N: Parallelilsm and Locality, Fall Lectures 8 & 9 (c) Mattan Erez

4 Simplified view of a pipelined processor
dispatch (issue) reg. access execute write-back commit fetch decode sequencer 1:add r4, r1, r2 2:add r5, r1, r3 3:add r6, r2, r3 1 F D I R E W C 2 3 EE382N: Parallelilsm and Locality, Fall Lectures 8 & 9 (c) Mattan Erez

5 Simplified view of a pipelined processor
dispatch (issue) reg. access execute write-back commit fetch decode sequencer 1:add r4, r1, r2 2:add r5, r1, r3 3:add r6, r2, r3 1 F D I R E W C 2 3 EE382N: Parallelilsm and Locality, Fall Lectures 8 & 9 (c) Mattan Erez

6 Simplified view of a pipelined processor
dispatch (issue) reg. access execute write-back commit fetch decode sequencer 1:add r4, r1, r2 2:add r5, r1, r3 3:add r6, r2, r3 1 F D I R E W C 2 3 What are the parallel resources? EE382N: Parallelilsm and Locality, Fall Lectures 8 & 9 (c) Mattan Erez

7 Simplified view of a pipelined processor
dispatch (issue) reg. access execute write-back commit fetch decode sequencer 1:add r4, r1, r2 2:add r5, r1, r4 3:add r6, r5, r3 1 F D I R E W C 2 3 EE382N: Parallelilsm and Locality, Fall Lectures 8 & 9 (c) Mattan Erez

8 Simplified view of a pipelined processor
dispatch (issue) reg. access execute write-back commit fetch decode sequencer 1:add r4, r1, r2 2:add r5, r1, r4 3:add r6, r5, r3 1 F D I R E W C 2 3 EE382N: Parallelilsm and Locality, Fall Lectures 8 & 9 (c) Mattan Erez

9 Simplified view of a pipelined processor
dispatch (issue) reg. access execute write-back commit fetch decode sequencer 1:add r4, r1, r2 2:add r5, r1, r3 3:add r6, r2, r3 1 F D I R E W C 2 3 Communication and synchronization mechanisms? EE382N: Parallelilsm and Locality, Fall Lectures 8 & 9 (c) Mattan Erez

10 Simplified view of a pipelined processor
dispatch (issue) reg. access execute write-back commit fetch decode sequencer 1:add r4, r1, r2 2:ld r5, r4 3:add r6, r5, r3 4:add r7, r1, r3 1 F D I R E W C 2 L 3 4 EE382N: Parallelilsm and Locality, Fall Lectures 8 & 9 (c) Mattan Erez

11 Simplified view of a OOO pipelined processor
dispatch (issue) issue (dispatch) reg. access execute write-back commit fetch decode rename sequencer scheduler 1:add r4, r1, r2 2:ld r5, r4 3:add r6, r5, r3 4:add r7, r1, r3 1 F D I d R E W C 2 L 3 4 Communication and synchronization mechanisms? EE382N: Parallelilsm and Locality, Fall Lectures 8 & 9 (c) Mattan Erez

12 Pipelining Summary Pipelining is using parallelism to hide latency
Do useful work while waiting for other work to finish Multiple parallel components, not multiple instances of same component Examples: Execution pipeline Memory pipelines Issue multiple requests to memory without waiting for previous requests to complete Software pipelines Overlap different software blocks to hide latency: computation/communication EE382N: Parallelilsm and Locality, Fall Lectures 8 & 9 (c) Mattan Erez

13 Parallel Execution Concurrency Communication Synchronization
what are the multiple resources? Communication and storage Synchronization Implicit? Explicit? what is being shared ? What is being partitioned ? EE382N: Parallelilsm and Locality, Fall Lectures 8 & 9 (c) Mattan Erez

14 Resources in a parallel processor/system
Execution ALUs Cores/processors Control Sequencers Instructions OOO schedulers State Registers Memories Networks EE382N: Parallelilsm and Locality, Fall Lectures 8 & 9 (c) Mattan Erez

15 Communication and synchronization
Clock – explicit compiler order Explicit signals (e.g., dependences) Implicit signals (e.g., flush/stall) More for pipelining than multiple ALUs Communication Bypass networks Registers Memory Explicit (over some network) EE382N: Parallelilsm and Locality, Fall Lectures 8 & 9 (c) Mattan Erez

16 Organizations for ILP (for multiple ALUs)
EE382N: Parallelilsm and Locality, Fall Lectures 8 & 9 (c) Mattan Erez

17 Superscalar (ILP for multiple ALUs)
sequencer scheduler Synchronization Explicit signals (dependences) Communication Bypass, registers, mem Shared Sequencer, OOO, registers, memories, net, ALUs Partitioned Instructions memory hierarchy How many ALUs? EE382N: Parallelilsm and Locality, Fall Lectures 8 & 9 (c) Mattan Erez

18 SMT/TLS (ILP for multiple ALUs)
EE382N: Parallelilsm and Locality, Fall Lectures 8 & 9 (c) Mattan Erez

19 SMT/TLS (ILP for multiple ALUs)
sequencer scheduler sequencer Synchronization Explicit signals (dependences) Communication Bypass, registers, mem Shared OOO, registers, memories, net, ALUs Partitioned Sequencer, Instructions, arch. registers memory hierarchy Why is this ILP? How many threads? EE382N: Parallelilsm and Locality, Fall Lectures 8 & 9 (c) Mattan Erez

20 VLIW (ILP for multiple ALUs)
EE382N: Parallelilsm and Locality, Fall Lectures 8 & 9 (c) Mattan Erez

21 VLIW (ILP for multiple ALUs)
sequencer Synchronization Clock+compiler Communication Registers, mem, bypass Shared Sequencer, OOO, registers, memories, net Partitioned Instructions, ALUs memory hierarchy How many ALUs? EE382N: Parallelilsm and Locality, Fall Lectures 8 & 9 (c) Mattan Erez

22 Explicit Dataflow (ILP for multiple ALUs)
EE382N: Parallelilsm and Locality, Fall Lectures 8 & 9 (c) Mattan Erez

23 Explicit Dataflow (ILP for multiple ALUs)
sequencer scheduler scheduler scheduler scheduler Synchronization Explicit signals Communication Registers+explicit Shared Sequencer, memories, net Partitioned Instructions, OOO, ALUs memory hierarchy EE382N: Parallelilsm and Locality, Fall Lectures 8 & 9 (c) Mattan Erez

24 DLP for multiple ALUs From HW – this is SIMD
EE382N: Parallelilsm and Locality, Fall Lectures 8 & 9 (c) Mattan Erez

25 SIMD (DLP for multiple ALUs)
sequencer Synchronization Clock+compiler Communication Explicit Shared Sequencer, instructions Partitioned Registers, memories, ALUs Sometimes: memories, net Local Mem Local Mem Local Mem Local Mem memory hierarchy EE382N: Parallelilsm and Locality, Fall Lectures 8 & 9 (c) Mattan Erez

26 Vectors (DLP for multiple ALUs)
sequencer Local Mem Local Mem Local Mem Local Mem memory hierarchy Vectors: memory addresses are part of single-instruction and not part of multiple-data EE382N: Parallelilsm and Locality, Fall Lectures 8 & 9 (c) Mattan Erez

27 TLP for multiple ALUs EE382N: Parallelilsm and Locality, Fall Lectures 8 & 9 (c) Mattan Erez

28 MIMD – shared memory (TLP for multiple ALUs)
sequencer sequencer sequencer sequencer scheduler scheduler scheduler scheduler Synchronization Explicit, memory Communication Memory Shared Memories, net Partitioned Sequencer, instructions, OOO, ALUs, registers, some nets memory hierarchy EE382N: Parallelilsm and Locality, Fall Lectures 8 & 9 (c) Mattan Erez

29 MIMD – distributed memory
sequencer sequencer sequencer sequencer scheduler scheduler scheduler scheduler Synchronization Explicit Communication Shared Net Partitioned Sequencer, instructions, OOO, ALUs, registers, some nets, memories memory hierarchy memory hierarchy memory hierarchy memory hierarchy EE382N: Parallelilsm and Locality, Fall Lectures 8 & 9 (c) Mattan Erez

30 Summary of communication and synchronization
Style Synchronization Communication Superscalar explicit signals (RS) registers+bypass VLIW clock+compiler registers (bypass?) Dataflow explicit signals registers+explicit SIMD explicit MIMD memory+explicit EE382N: Parallelilsm and Locality, Fall Lectures 8 & 9 (c) Mattan Erez

31 Summary of sharing in ILP HW
Style Seq Inst OOO Regs Mem ALUs Net Superscalar S P SMT/TLS VLIW N/A Dataflow B EE382N: Parallelilsm and Locality, Fall Lectures 8 & 9 (c) Mattan Erez

32 Summary of sharing in DLP and TLP
Style Seq Inst OOO Regs Mem ALUs Net Vector S N/A P s B SIMD p MIMD S/P EE382N: Parallelilsm and Locality, Fall Lectures 8 & 9 (c) Mattan Erez


Download ppt "Mattan Erez The University of Texas at Austin"

Similar presentations


Ads by Google