Presentation is loading. Please wait.

Presentation is loading. Please wait.

Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania 18042 Computer Organization Pipelined Processor Design 3.

Similar presentations


Presentation on theme: "Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania 18042 Computer Organization Pipelined Processor Design 3."— Presentation transcript:

1 Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania 18042 nestorj@lafayette.edu Computer Organization Pipelined Processor Design 3 Feb 2005 Reading: 6.7, 6.9-6.12, 6.13* Homework Due 6/8: 6.1, 6.2, 6.3, 6.4, 6.7, 6.8, 6.9, 6.15 Portions of these slides are derived from: Textbook figures © 1998 Morgan Kaufmann Publishers all rights reserved Tod Amon's COD2e Slides © 1998 Morgan Kaufmann Publishers all rights reserved Dave Patterson’s CS 152 Slides - Fall 1997 © UCB Rob Rutenbar’s 18-347 Slides - Fall 1999 CMU other sources as noted

2 Feb 2005Pipelining 32 Project 4 - Pipelined MIPS

3 Feb 2005Pipelining 33 Project 4 - What to Do  Download & simulate basic model  Extend processor to do either  Data forwarding + load/use stall (see Fig. 6-36/6.33) OR  Branch implementation in ID including IF.Flush (See Fig. 6-38)  Simulate extended processor to show it works  You may work in groups of two

4 Feb 2005Pipelining 34 Review - Pipelined Processor with Hazard Detection (Fig. 6.36)

5 Feb 2005Pipelining 35 Pipelined Processor - Branch Hardware in ID (Old Fig. 6.51)

6 Feb 2005Pipelining 36 Pipelining Outline  Introduction  Pipelined Processor Design  Advanced Pipelining   Overview - Instruction Level Parallelism  Superpipelining  Static Multiple Issue  Dynamic Multiple Issue  Speculation  Simultaneous Multithreading (HyperThreading)

7 Feb 2005Pipelining 37 Instruction Level Parallelism (ILP)  Parallel execution of instructions is known as Instruction Level Parallelism (ILP)  Pipelining exploits ILP by overlapping execution  ILP limited by  Data hazards  Control hazards

8 Feb 2005Pipelining 38 Techniques to Increase ILP  Forwarding  Branch Prediction  Superpipelining   Static Multiple Issue  Dynamic Multiple Issue  Speculation  Simultaneous Multithreading (SMT)

9 Feb 2005Pipelining 39 Superpipelining  Key idea: increase the number of stages  MIPS R2000 - 5 Stages  MIPS R4000 - 8 Stages  Pentium 3 - 10 Stages  Pentium 4 - 20 Stages  Tradeoffs +Less logic in each stage -> faster clock -Longer pipeline -> higher penalty for stalls, flushes  Used in conjunction with other techniques (e.g. branch prediction) to overcome disadvantages

10 Feb 2005Pipelining 310 Techniques to Increase ILP  Forwarding  Branch Prediction  Superpipelining  Static Multiple Issue   Dynamic Multiple Issue  Speculation  Simultaneous Multithreading (SMT)

11 Feb 2005Pipelining 311 Static Multiple Issue  Key idea: issue (decode & execute) multiple instructions in each clock cycle  Example: Issue load/store and ALU/branch in MIPS ALU or branch Instruction typePipe stages IFIDEXMEMWB Load/ StoreIFIDEXMEMWB ALU or branch Load/ Store ALU or branch Load/ Store ALU or branch Load/ Store IFIDEXMEMWB IFIDEXMEMWB IFIDEXMEMWB IFIDEXMEMWB IFIDEXMEMWB IFIDEXMEMWB (Fig. 6.44)

12 Feb 2005Pipelining 312 Example - A Static Multiple Issue MIPS (Fig. 6.45) Executes ALU/Branch Instructions Executes Load/Store Instructions

13 Feb 2005Pipelining 313 Static Multiple Issue Tradeoffs  Advantage: increased performance  Real processors issue up to 6 instructions / cycle  Several challenges:  Building a register file with lots of ports  Dealing with data dependencies  Stalls due to control dependencies (branch prediction helps!)  Building a memory system that can “keep up” (caches help!)  Finding opportunities to fully utilize the functional units

14 Feb 2005Pipelining 314 VLIW / EPIC Processors  VLIW - Very Long Instruction Words  Functional units exposed in instruction word  Static scheduling by compiler  Pipeline is exposed; compiler must schedule delays to get right result  Examples: Philips Trimedia, Transmeta Crusoe  Explicit Parallel Instruction Computer (EPIC)  3 41-bit instructions in each instruction packet  Compiler determines parallelism  Hardware checks dependencies and fowards/stalls  Examples: Intel Itanium, Itanium 2

15 Feb 2005Pipelining 315 Itanium Block Diagram Source: Extreme Tech www.extremetech.com

16 Feb 2005Pipelining 316 Software Manipulation to Increase ILP  Software Transformations can increase ILP  Code reordering to reduce stalls  Loop unrolling  Example (p. 438) Loop:lw$t0, 0($s1) # $t0=array element addu$t0, $t0, $t2 # add scalar in $s2 sw$t0, 0($s1) # store result addi$s1, $s1, -4 # decrement ptr bne$s1, $zero, Loop  Goal: reorder to speed superscalar execution

17 Feb 2005Pipelining 317 Software Manipulation Reordering Code  Note sparse utilization of superscalar pipeline!  End result:  5 instructions in 4 clocks  CPI = 0.8 ALU or branch instructionData transfer instructionClock Loop:lw $t0, 0($s1)1 addi $s1, $s1, -42 addu $t0, $t0, $s23 bne $s1, $zero, Loopsw $t0, 4($s1)4

18 Feb 2005Pipelining 318 Software Manipulation - Loop Unrolling  Assume loop count a multiple of 4 & unroll  End result:  4 loop iterations in 8 clocks  2 clocks / iteration! ALU or branch instructionData transfer instructionClock Loop:addi $s1, $s1, -16lw $t0, 0($s1)1 lw $t1, 12($s1)2 lw $t2, 8($s1)3 lw $t3, 4($s1)4 sw $t0, 0($s1)5 sw $t1, 12($s1)6 sw $t2, 8($s1)7 bne $s1, $zero, Loopsw $t3, 4($s1)8 addu $t0, $t0, $s2 addu $t1, $t1, $s2 addu $t2, $t2, $s2 addu $t3, $t3, $s2

19 Feb 2005Pipelining 319 Techniques to Increase ILP  Forwarding  Branch Prediction  Superpipelining  Static Multiple Issue  Dynamic Multiple Issue   Speculation  Simultaneous Multithreading (SMT)

20 Feb 2005Pipelining 320 Dynamic Multiple Issue  Key ideas:  ”Look past" stalls for instructions that can execute lw $t0, 20($t2) addu$t1, $t0, $t2 sub$s4, $s4, $s3 slti$t5, $s4, 20  Execute instructions out of order  Use multiple functional units for parallel execution  Forward results between functional units when necessary  Update registers (in original order of execution) addu stalls until $t0 available sub is ready to execute but blocked by stall!

21 Feb 2005Pipelining 321 Speculation  Guess about the outcome of an instruction (e.g., branch or load)  Based on guess, start executing instructions  Cancel started instructions if guess is incorrect  Complicating factors  Must buffer instruction results until outcome known  Exceptions in speculated instructions - how can you have an exception in an instruction that didn’t execute?

22 Feb 2005Pipelining 322 Dynamic Pipelining (Fig. 6.49) Instruction Fetch and decode unit Reservation station Reservation station Reservation station Reservation station Integer Floating point Load/ Store Commit unit Functional units In-order issue In-order commit Out-of-order execute

23 Feb 2005Pipelining 323 Dynamic Pipelining in the Pentium 4 (Fig. 6.50)

24 Feb 2005Pipelining 324 Dynamic Pipelining in the Pentium 4 Source: “The Microarchitecture of the Pentium® 4 Processor”, Intel Technology Journal, First Quarter 2001 http://developer.intel.com/technology/itj/q12001/articles/art_2.htm.

25 Feb 2005Pipelining 325 Pentium 3 & 4 Pipeline Stages Source: “The Microarchitecture of the Pentium® 4 Processor”, Intel Technology Journal, First Quarter 2001 http://developer.intel.com/technology/itj/q12001/articles/art_2.htm. Drive stages - waiting for signal propagation

26 Feb 2005Pipelining 326 Techniques to Increase ILP  Forwarding  Branch Prediction  Superpipelining  Multiple Issue - Superscalar, VLIW/EPIC  Software manipulation  Dynamic Pipeline Scheduling  Speculative Execution  Simultaneous Multithreading (SMT) 

27 Feb 2005Pipelining 327 Simultaneous Multithreading (SMT)  Key idea: extend processor to multiple threads of execution that execute concurrently  Each thread has its own PC and register state  Scheduling hardware shares functional units  Appears to software as two “separate” processors  Advantage: when one thread stalls, another may be ready  Proposed for servers, where multiple threads are common State Thread A State Thread B Functional Units Issue Slots Time

28 Feb 2005Pipelining 328 Roadmap for the term: major topics  Overview / Abstractions and Technology  Performance  Instruction sets  Logic & arithmetic  Processor Implementation  Memory systems   Input/Output


Download ppt "Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania 18042 Computer Organization Pipelined Processor Design 3."

Similar presentations


Ads by Google