Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.

Slides:



Advertisements
Similar presentations
Morgan Kaufmann Publishers The Processor
Advertisements

1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
The Pipelined CPU Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Revised 9/22/2013.
Chapter Six 1.
Instruction-Level Parallelism (ILP)
Pipelined Processor II (cont’d) CPSC 321
11/1/2005Comp 120 Fall November Exam Postmortem 11 Classes to go! Read Sections 7.1 and 7.2 You read 6.1 for this time. Right? Pipelining then.
1  1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining.
Mary Jane Irwin ( ) [Adapted from Computer Organization and Design,
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Pipelining III Andreas Klappenecker CPSC321 Computer Architecture.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
1 Chapter Six - 2nd Half Pipelined Processor Forwarding, Hazards, Branching EE3055 Web:
1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.
L18 – Pipeline Issues 1 Comp 411 – Spring /03/08 CPU Pipelining Issues Finishing up Chapter 6 This pipe stuff makes my head hurt! What have you.
L17 – Pipeline Issues 1 Comp 411 – Fall /1308 CPU Pipelining Issues Finishing up Chapter 6 This pipe stuff makes my head hurt! What have you been.
1 CSE SUNY New Paltz Chapter Six Enhancing Performance with Pipelining.
Pipelined Datapath and Control (Lecture #15) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
1  1998 Morgan Kaufmann Publishers Chapter Six Enhancing Performance with Pipelining.
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve performance by increasing instruction throughput.
Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.
1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell CS352H: Computer Systems Architecture Topic 8: MIPS Pipelined.
Chapter 4 CSF 2009 The processor: Pipelining. Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction.
Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
CMPE 421 Parallel Computer Architecture
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
CS 1104 Help Session IV Five Issues in Pipelining Colin Tan, S
CECS 440 Pipelining.1(c) 2014 – R. W. Allison [slides adapted from D. Patterson slides with additional credits to M.J. Irwin]
Winter 2002CSE Topic Branch Hazards in the Pipelined Processor.
Chap 6.1 Computer Architecture Chapter 6 Enhancing Performance with Pipelining.
CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
1 (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3 rd Ed., Morgan Kaufmann,
Computing Systems Pipelining: enhancing performance.
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.
Pipelining Example Laundry Example: Three Stages
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)
Introduction to Computer Organization Pipelining.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
L17 – Pipeline Issues 1 Comp 411 – Fall /23/09 CPU Pipelining Issues Read Chapter This pipe stuff makes my head hurt! What have you been.
Pipelining Intro Computer Organization 1 Computer Science Dept Va Tech January 2006 ©2006 McQuain & Ribbens Basic Instruction Timings Making some assumptions.
1  2004 Morgan Kaufmann Publishers No encoding: –1 bit for each datapath operation –faster, requires more memory (logic) –used for Vax 780 — an astonishing.
CS203 – Advanced Computer Architecture Pipelining Review.
Chapter Six.
CS 352H: Computer Systems Architecture
Computer Organization CS224
Pipeline Implementation (4.6)
ECS 154B Computer Architecture II Spring 2009
Chapter 4 The Processor Part 3
Morgan Kaufmann Publishers The Processor
Morgan Kaufmann Publishers The Processor
Pipelining Chapter 6.
The processor: Pipelining and Branching
Lecture 9. MIPS Processor Design – Pipelined Processor Design #2
Pipelining in more detail
Chapter Six.
Chapter Six.
Control unit extension for data hazards
November 5 No exam results today. 9 Classes to go!
CS203 – Advanced Computer Architecture
CSC3050 – Computer Architecture
Pipelining (II).
Control unit extension for data hazards
CSC3050 – Computer Architecture
Control unit extension for data hazards
Presentation transcript:

Chapter 6 Pipelined CPU Design

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Improve performance by increasing instruction throughput Ideal speedup is number of stages in the pipeline. Do we achieve this? Pipelined operation of MIPS CPU Text Fig. 6.3

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelining What makes it easy all instructions are the same length just a few instruction formats memory operands appear only in loads and stores What makes it hard? structural hazards: suppose we had only one memory control hazards: need to worry about branch instructions data hazards: an instruction depends on a previous instruction We’ll build a simple pipeline and look at these issues We’ll talk about modern processors and what really makes it hard: exception handling trying to improve performance with out-of-order execution, etc.

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides “Stages” in the single-cycle CPU What do we need to add to actually split the datapath into stages?

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined Datapath Text Fig. 6.11

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Corrected Datapath Text Fig Note: destination register number forwarded to register file

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Graphically Representing Pipelines Can help with answering questions like: how many cycles does it take to execute this code? what is the ALU doing during cycle 4? use this representation to help understand datapaths

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipeline Control - Identify control signals

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides We have 5 stages. What needs to be controlled in each stage? Instruction Fetch and PC Increment Instruction Decode / Register Fetch Execution Memory Stage Write Back How would control be handled in an automobile plant? a fancy control center telling everyone what to do? should we use a finite state machine? Pipeline control

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pass control signals along just like the data Pipeline Control

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Datapath with Control

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Problem with starting next instruction before first is finished dependencies that “go backward in time” are data hazards Dependencies

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Have compiler guarantee no hazards by Where do we insert the “nops” ? sub$2, $1, $3 and $12, $2, $5 or$13, $6, $2 add$14, $2, $2 sw$15, 100($2) Problem: this really slows us down! Software Solution

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Use temporary results, don’t wait for them to be written register file forwarding to handle read/write to same register ALU forwarding Forwarding what if this $2 was $13?

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Forwarding The main idea (some details not shown)

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Load word can still cause a hazard: an instruction tries to read a register following a load instruction that writes to the same register. Thus, we need a hazard detection unit to “stall” the load instruction Can't always forward

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Stalling We can stall the pipeline by keeping an instruction in the same stage bubble Program execution order (in instructions) lw$2, 20($1) and becomes nop and $4, $2, $5 or $8, $2, $6 add $9, $4, $2 Time (in clock cycles) CC 1CC 2CC 3CC 4CC 5CC 6CC 7CC 8CC 9CC 10 IMDM RegReg IMDMReg Reg IMDM RegReg IMDM Reg Reg IMDM Reg Reg

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Hazard Detection Unit Stall by letting an instruction that won’t write anything go forward

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Improving Performance Try and avoid stalls! E.g., reorder these instructions: lw $t0, 0($t1) lw $t2, 4($t1) sw $t2, 0($t1) sw $t0, 4($t1) Dynamic Pipeline Scheduling Hardware chooses which instructions to execute next Will execute instructions out of order (e.g., doesn’t wait for a dependency to be resolved, but rather keeps going!) Speculates on branches and keeps the pipeline full (may need to rollback if prediction incorrect) Trying to exploit instruction-level parallelism

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Cypress CY7C601 Sparc CPU

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Fujitsu MB86900 Sparc CPU

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides When we decide to branch, other instructions are in the pipeline! We are predicting “branch not taken” need to add hardware for flushing instructions if we are wrong Branch Hazards

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Flushing Instructions Note: we’ve also moved branch decision to ID stage

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Minimizing Branch Penalties Make branch decision earlier in the pipeline Branch penalty = # instructions fetched from wrong path Earlier branch decision reduces the number of “in progress” instructions to be flushed Delayed branch Delay the branch for one or more instructions These instructions do not get flushed from the pipeline Branch prediction Early in the pipeline, predict the branch outcome (taken vs. not taken) Fetch next instructions from predicted path No time penalty unless prediction is incorrect

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Delayed Branching Allow instruction immediately following the branch to be completed (effectively delaying the branch) ExampleMove add and “delay” beq add $3,$2,$4 beq $1,$2,label beq $1,$2,labeladd $3,$2,$4sub $5,$1,$2xor $4,$6,$7 Flush sub & xor instructions Execute add instruction in if branch taken. “delay slot” whether or not branch is taken. Only sub instruction is flushed. Compiler must move an instruction into the delay slot that will not affect the branch decision.

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Branch Prediction If the branch is taken, we have a penalty of one cycle For our simple design, this is reasonable With deeper pipelines, penalty increases and static branch prediction drastically hurts performance Solution: dynamic branch prediction A 2-bit prediction scheme

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Branch Prediction Sophisticated Techniques: A “branch target buffer” to help us look up the destination Correlating predictors that base prediction on global behavior and recently executed branches (e.g., prediction for a specific branch instruction based on what happened in previous branches) Tournament predictors that use different types of prediction strategies and keep track of which one is performing best. A “branch delay slot” which the compiler tries to fill with a useful instruction (make the one cycle delay part of the ISA) Branch prediction is especially important because it enables other more advanced pipelining techniques to be effective! Modern processors predict correctly 95% of the time!

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Branch Target Buffer (Prediction Table) Instruction Fetch Address Branch Target Address Recent Branch History Taken Taken Not taken Taken Enter branch information when first executed (computed target address, taken/not taken) Next time instruction fetched, use previously computed target address if branch previously taken, or PC+4 otherwise (flush pipeline if branch later found to not be taken)

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Handling Exceptions in a Pipeline Example: 40sub$11, $2, $4 44and$12, $2, $5 48or$13, $2, $6 4Cadd $1, $2, $1 - produces an overflow 50slt$15, $6, $7 54lw$16, 50($7) Actions in response to exception (overflow) 1. Finish instructions in MEM & WB stages (and, or) 2. Flush instructions in IF & ID stages (slt, lw) 3. Flush the offending instruction (add) from EX stage, saving the PC value (extra hardware) 4. Fetch next instruction from exception handler routine

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Advanced Pipelining Increase the depth of the pipeline (speedup proportional to number of pipeline stages – in ideal case) Start more than one instruction each cycle (multiple issue) Loop unrolling to expose more ILP (better scheduling) “Superscalar” processors DEC Alpha 21264: 9 stage pipeline, 6 instruction issue All modern processors are superscalar and issue multiple instructions usually with some limitations (e.g., different “pipes”) VLIW: very long instruction word, static multiple issue (relies more on compiler technology

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pentium 4 Microarchitecture

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pentium 4 Pipeline IA-32 instructions decoded to micro-operations Multiple functional units (7) and multiple issue to achieve high performance Up to 126 “outstanding instructions” at a time Trace cache – save decoded instructions to eliminate subsequent decode Branch prediction – track 4K branch instructions Deep pipeline: 20 or more stages per instruction

Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Chapter 6 Summary Pipelining does not improve latency, but does improve throughput