Chapter 4 The Processor Part 3

Slides:



Advertisements
Similar presentations
Morgan Kaufmann Publishers The Processor
Advertisements

Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
ECE 445 – Computer Organization
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Pipelined Processor.
Part 2 - Data Hazards and Forwarding 3/24/04++
Review: MIPS Pipeline Data and Control Paths
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 18 - Pipelined.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
©UCB CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan
Chapter Six Enhancing Performance with Pipelining
1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.
 The actual result $1 - $3 is computed in clock cycle 3, before it’s needed in cycles 4 and 5  We forward that value to later instructions, to prevent.
1 CSE SUNY New Paltz Chapter Six Enhancing Performance with Pipelining.
Computer Organization Lecture Set – 06 Chapter 6 Huei-Yung Lin.
Pipelined Datapath and Control (Lecture #15) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
Lecture 28: Chapter 4 Today’s topic –Data Hazards –Forwarding 1.
1 Stalls and flushes  So far, we have discussed data hazards that can occur in pipelined CPUs if some instructions depend upon others that are still executing.
Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.
Chapter 4B: The Processor, Part B. Review: Why Pipeline? For Performance! I n s t r. O r d e r Time (clock cycles) Inst 0 Inst 1 Inst 2 Inst 4 Inst 3.
1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.
Pipeline Data Hazards: Detection and Circumvention Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from slides kindly.
Pipelined Datapath and Control
CPE432 Chapter 4B.1Dr. W. Abu-Sufah, UJ Chapter 4B: The Processor, Part B-2 Read Section 4.7 Adapted from Slides by Prof. Mary Jane Irwin, Penn State University.
Chapter 4 CSF 2009 The processor: Pipelining. Performance Issues Longest delay determines clock period – Critical path: load instruction – Instruction.
Basic Pipelining & MIPS Pipelining Chapter 6 [Computer Organization and Design, © 2007 Patterson (UCB) & Hennessy (Stanford), & Slides Adapted from: Mary.
CMPE 421 Parallel Computer Architecture
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
CMPE 421 Parallel Computer Architecture Part 2: Hardware Solution: Forwarding.
1 (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3 rd Ed., Morgan Kaufmann,
CSE431 L07 Overcoming Data Hazards.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 07: Overcoming Data Hazards Mary Jane Irwin (
Computing Systems Pipelining: enhancing performance.
CSIE30300 Computer Architecture Unit 05: Overcoming Data Hazards Hsin-Chou Chi [Adapted from material by and
CSE431 L06 Basic MIPS Pipelining.1Irwin, PSU, 2005 MIPS Pipeline Datapath Modifications  What do we need to add/modify in our MIPS datapath? l State registers.
CPE432 Chapter 4B.1Dr. W. Abu-Sufah, UJ Chapter 4B: The Processor, Part B-1 Read Sections 4.7 Adapted from Slides by Prof. Mary Jane Irwin, Penn State.
Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.
CSE 340 Computer Architecture Spring 2016 Overcoming Data Hazards.
Computer Organization
Stalling delays the entire pipeline
Note how everything goes left to right, except …
Morgan Kaufmann Publishers
Morgan Kaufmann Publishers The Processor
Single Clock Datapath With Control
Pipeline Implementation (4.6)
Appendix C Pipeline implementation
Chapter 4 The Processor Part 4
ECS 154B Computer Architecture II Spring 2009
ECS 154B Computer Architecture II Spring 2009
ECE232: Hardware Organization and Design
Morgan Kaufmann Publishers The Processor
Data Hazards and Stalls
Forwarding Now, we’ll introduce some problems that data hazards can cause for our pipelined processor, and show how to handle them with forwarding.
Review: MIPS Pipeline Data and Control Paths
Morgan Kaufmann Publishers The Processor
Csci 136 Computer Architecture II – Data Hazard, Forwarding, Stall
Morgan Kaufmann Publishers The Processor
Morgan Kaufmann Publishers The Processor
Pipelining review.
Morgan Kaufmann Publishers Enhancing Performance with Pipelining
Computer Organization CS224
Pipelining in more detail
The Processor Lecture 3.6: Control Hazards
The Processor Lecture 3.5: Data Hazards
Instruction Execution Cycle
CSC3050 – Computer Architecture
Pipelining (II).
Morgan Kaufmann Publishers The Processor
Systems Architecture II
Stalls and flushes Last time, we discussed data hazards that can occur in pipelined CPUs if some instructions depend upon others that are still executing.
©2003 Craig Zilles (derived from slides by Howard Huang)
Presentation transcript:

Chapter 4 The Processor Part 3

Can pipelining get us into trouble? Yes: Pipeline Hazards structural hazards: attempt to use the same resource two different ways at the same time E.g., two instructions try to read the same memory at the same time data hazards: attempt to use item before it is ready instruction depends on result of prior instruction still in the pipeline add r1, r2, r3 sub r4, r2, r1 control hazards: attempt to make a decision before condition is evaluated branch instructions beq r1, loop Can always resolve hazards by waiting pipeline control must detect the hazard take action (or delay action) to resolve hazards

Morgan Kaufmann Publishers 10 November, 2018 Structure Hazards Conflict for use of a resource In MIPS pipeline with a single memory Load/store requires data access Instruction fetch would have to stall for that cycle Would cause a pipeline “bubble” Hence, pipelined datapaths require separate instruction/data memories Or separate instruction/data caches Chapter 4 — The Processor

Structural Hazards limit performance Example: if 1.3 memory accesses per instruction and only one memory access per cycle then average CPI = 1.3 otherwise resource is more than 100% utilized Solution 1: Use separate instruction and data memories Solution 2: Allow memory to read and write more than one word per cycle Solution 3: Stall

Single Memory is a Structural Hazard Time (clock cycles) Reading data from memory ALU I n s t r. O r d e Mem Reg Mem Reg Load ALU Mem Reg Instr 1 ALU Mem Reg Instr 2 ALU Instr 3 Mem Reg Mem Reg Reading instruction from memory ALU Mem Reg Instr 4 Detection is easy in this case! (right half highlight means read, left half write)

How About Register File Access? Time (clock cycles) Internal bypassing path Fix register file access hazard by doing reads in the second half of the cycle and writes in the first half ALU IM Reg DM add $1, I n s t r. O r d e ALU IM Reg DM Inst 1 ALU IM Reg DM Inst 2 ALU IM Reg DM add $2,$1, clock edge that controls loading of pipeline state registers clock edge that controls register writing

Morgan Kaufmann Publishers 10 November, 2018 Data Hazards An instruction depends on completion of data access by a previous instruction add $s0, $t0, $t1 sub $t2, $s0, $t3 Chapter 4 — The Processor

Forwarding (aka Bypassing) Morgan Kaufmann Publishers 10 November, 2018 Forwarding (aka Bypassing) Use result when it is computed Don’t wait for it to be stored in a register Requires extra connections in the datapath Chapter 4 — The Processor — 8 Chapter 4 — The Processor

Loads Can Cause Data Hazards Dependencies backward in time cause hazards ALU IM Reg DM lw $1,4($2) I n s t r. O r d e ALU IM Reg DM sub $4,$1,$5 ALU IM Reg DM and $6,$1,$7 ALU IM Reg DM or $8,$1,$9 ALU IM Reg DM xor $4,$1,$5 Load-use data hazard

Morgan Kaufmann Publishers 10 November, 2018 Load-Use Data Hazard Can’t always avoid stalls by forwarding If value not computed when needed Can’t forward backward in time! Chapter 4 — The Processor — 10 Chapter 4 — The Processor

Code Scheduling to Avoid Stalls Morgan Kaufmann Publishers 10 November, 2018 Code Scheduling to Avoid Stalls Reorder code to avoid use of load result in the next instruction C code for A = B + E; C = B + F; lw $t1, 0($t0) lw $t2, 4($t0) add $t3, $t1, $t2 sw $t3, 12($t0) lw $t4, 8($t0) add $t5, $t1, $t4 sw $t5, 16($t0) lw $t1, 0($t0) lw $t2, 4($t0) lw $t4, 8($t0) add $t3, $t1, $t2 sw $t3, 12($t0) add $t5, $t1, $t4 sw $t5, 16($t0) stall stall 13 cycles 11 cycles Chapter 4 — The Processor

Morgan Kaufmann Publishers 10 November, 2018 Control Hazards Branch determines flow of control Fetching next instruction depends on branch outcome Pipeline can’t always fetch correct instruction Still working on ID stage of branch In MIPS pipeline Need to compare registers and compute target early in the pipeline Add hardware to do it in ID stage Chapter 4 — The Processor

Morgan Kaufmann Publishers 10 November, 2018 Stall on Branch Wait until branch outcome determined before fetching next instruction Chapter 4 — The Processor — 13 Chapter 4 — The Processor

Morgan Kaufmann Publishers 10 November, 2018 Branch Prediction Longer pipelines can’t readily determine branch outcome early Stall penalty becomes unacceptable Predict outcome of branch Only stall if prediction is wrong In MIPS pipeline Can predict branches not taken Fetch instruction after branch, with no delay Chapter 4 — The Processor

MIPS with Predict Not Taken Morgan Kaufmann Publishers 10 November, 2018 MIPS with Predict Not Taken Prediction correct Prediction incorrect Chapter 4 — The Processor

More-Realistic Branch Prediction Morgan Kaufmann Publishers 10 November, 2018 More-Realistic Branch Prediction Static branch prediction Based on typical branch behavior Example: loop and if-statement branches Predict backward branches taken Predict forward branches not taken Dynamic branch prediction Hardware measures actual branch behavior e.g., record recent history of each branch Assume future behavior will continue the trend When wrong, stall while re-fetching, and update history Chapter 4 — The Processor

Data Hazard sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 Is there any problems ? sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2)

Data Hazard on $2 sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 Problem: $2 cannot be read by other instructions before it is written by the add. sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2)

Dependencies Problem with starting next instruction before first is finished dependencies that go backward in time are data hazards

Software Solution Have compiler guarantee no hazards Where do we insert the stalls(nops)?? sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2) Problem: this really slows us down!

Stall: One Way to “Fix” a Data Hazard ALU IM Reg DM sub $2, $1, $3 I n s t r. O r d e stall stall and $12, $2, $5 or $13, $6, $2 ALU IM Reg DM Can fix data hazard by waiting – stall – but affects throughput

Another Way to “Fix” a Data Hazard Use temporary results, don’t wait for them to be written register file forwarding to handle read/write to same register ALU forwarding what if this $2 was $13?

Forwarding add r1,r2,r3 sub r4,r1,r5 and r6,r7,r1 or r8,r1,r1 Can fix data hazard by forwarding results as soon as they are available to where they are needed. ALU IM Reg DM add r1,r2,r3 I n s t r. O r d e ALU IM Reg DM sub r4,r1,r5 ALU IM Reg DM and r6,r7,r1 ALU IM Reg DM or r8,r1,r1 ALU IM Reg DM sw r4,100(r1) Note: Forwarding from ALU supplied by EX/MEM register

Data Forwarding (aka Bypassing) Any data dependence line that goes backwards in time EX stage generating R-type ALU results or effective address calculation MEM stage generating lw results Forward by taking the inputs to the ALU from any pipeline register rather than just ID/EX by adding multiplexors to the inputs of the ALU so can pass Rd data to either (or both) of the EX’s stage Rs and Rt ALU inputs 00: normal input (ID/EX pipeline registers) 10: forward from previous instr (EX/MEM pipeline registers) 01: forward from instr 2 back (MEM/WB pipeline registers) adding the proper control hardware With forwarding, can run at full speed even in the presence of data dependencies

Data Forwarding Control Conditions (1/4) EX/MEM hazard: if (EX/MEM.RegisterRd == ID/EX.RegisterRs)) ForwardA = 10 if (EX/MEM.RegisterRd = ID/EX.RegisterRt)) ForwardB = 10 “RegisterRd” is number of register to be written (RD or RT) “RegisterRs” is number of RS register “RegisterRt” is number of RT register “ForwardA, ForwardB” controls forwarding muxes MEM/WB hazard: if (MEM/WB.RegisterRd == ID/EX.RegisterRs)) ForwardA = 01 if (MEM/WB.RegisterRd == ID/EX.RegisterRt)) ForwardB = 01 Forwards the result from the previous instr. to either input of the ALU. Forwards the result from the second previous instr. to either input of the ALU.

Data Forwarding Control Conditions (2/4) EX/MEM hazard: if (EX/MEM.RegWrite and (EX/MEM.RegisterRd == ID/EX.RegisterRs)) ForwardA = 10 and (EX/MEM.RegisterRd == ID/EX.RegisterRt)) ForwardB = 10 MEM/WB hazard: if (MEM/WB.RegWrite and (MEM/WB.RegisterRd == ID/EX.RegisterRs)) ForwardA = 01 and (MEM/WB.RegisterRd == ID/EX.RegisterRt)) ForwardB = 01 Forwards the result from the previous instr. to either input of the ALU provided it writes. Forwards the result from the second previous instr. to either input of the ALU provided it writes.

Data Forwarding Control Conditions (3/4) EX/MEM hazard: if (EX/MEM.RegWrite and (EX/MEM.RegisterRd != 0) and (EX/MEM.RegisterRd == ID/EX.RegisterRs)) ForwardA = 10 and (EX/MEM.RegisterRd == ID/EX.RegisterRt)) ForwardB = 10 MEM/WB hazard: if (MEM/WB.RegWrite and (MEM/WB.RegisterRd != 0) and (MEM/WB.RegisterRd == ID/EX.RegisterRs)) ForwardA = 01 and (MEM/WB.RegisterRd == ID/EX.RegisterRt)) ForwardB = 01 Forwards the result from the previous instr. to either input of the ALU provided it writes and != R0. Forwards the result from the second previous instr. to either input of the ALU provided it writes and != R0. What’s wrong with this hazard control?

Yet Another Complication! Another potential data hazard can occur when there is a conflict between the result of the WB stage instruction and the MEM stage instruction which should be forwarded? More recent result! I n s t r. O r d e ALU IM Reg DM add $1,$1,$2 add $1,$1,$3 ALU IM Reg DM ALU IM Reg DM add $1,$1,$4

Corrected Data Forwarding Control Conditions MEM/WB hazard: if (MEM/WB.RegWrite and (MEM/WB.RegisterRd != 0) and (MEM/WB.RegisterRd == ID/EX.RegisterRs) and (EX/MEM.RegisterRd != ID/EX.RegisterRs || ~ EX/MEM.RegWrite)) ForwardA = 01 and (MEM/WB.RegisterRd == ID/EX.RegisterRt) and (EX/MEM.RegisterRd != ID/EX.RegisterRt || ~ EX/MEM.RegWrite))) ForwardB = 01

Datapath with Forwarding Hardware 1 PCSrc ID/EX EX/MEM Control IF/ID Add MEM/WB Branch Add 4 Shift left 2 Instruction Memory Read Addr 1 Data Memory Register File Read Data 1 Read Addr 2 Read Address PC Read Data Address 1 Write Addr ALU Read Data 2 1 Write Data Write Data ALU cntrl 16 32 Sign Extend How many bits wide is each pipeline register now? ID/EX = 9 + 32x4 + 10 = 147 + 10 = 157 EX/MEM.RegisterRd MEM/WB.RegisterRd IF/ID.RegisterRs IF/ID.RegisterRt 1 Forward Unit Control line inputs to Forward Unit EX/MEM.RegWrite and MEM/WB.RegWrite not shown on diagram

Memory-to-Memory Copies For loads immediately followed by stores (memory-to-memory copies) can avoid a stall by adding forwarding hardware from the MEM/WB register to the data memory input. Would need to add a Forward Unit to the memory access stage Should avoid stalling on such a load I n s t r. O r d e ALU IM Reg DM lw $1,10($2) ALU IM Reg DM sw $1,10($3)

Forwarding (or Bypassing): What about Loads Dependencies backwards in time are hazards Can’t solve with forwarding Must delay/stall instruction dependent on loads Time (clock cycles) IF ID/RF EX MEM WB ALU Im Reg Dm lw $1, 0($2) sub $4, $1, $3

Can't always forward Load word can still cause a hazard: an instruction tries to read a register following a load instruction that writes to the same register. Thus, we need a hazard detection unit to stall the load instruction

Stalling We can stall the pipeline by keeping an instruction in the same stage

Stall/Bubble in the Pipeline Morgan Kaufmann Publishers 10 November, 2018 Stall/Bubble in the Pipeline Stall inserted here Chapter 4 — The Processor — 36 Chapter 4 — The Processor

Stall/Bubble in the Pipeline Morgan Kaufmann Publishers 10 November, 2018 Stall/Bubble in the Pipeline Or, more accurately… Chapter 4 — The Processor — 37 Chapter 4 — The Processor

Load-use Hazard Detection Unit Need a hazard detection unit in the ID stage that inserts a stall between the load and its use The first line tests to see if the instruction is a load; the next two lines check to see if the destination register of the load in the EX stage matches either source registers of the instruction in the ID stage After this 1-cycle stall, the forwarding logic can handle the remaining data hazards ID Hazard Detection if (ID/EX.MemRead and ((ID/EX.RegisterRt = IF/ID.RegisterRs) or (ID/EX.RegisterRt = IF/ID.RegisterRt))) stall the pipeline

Stall Hardware In addition to the hazard detection unit, we have to implement the stall Prevent the IF and ID stage instructions from making progress down the pipeline, done by preventing the PC register and the IF/ID pipeline register from changing Hazard detection unit controls the writing of the PC and IF/ID registers The instructions in the back half of the pipeline starting with the EX stage must be flushed (execute noop) Must deassert the control signals (setting them to 0) in the EX, MEM, and WB control fields of the ID/EX pipeline register. Hazard detection unit controls the multiplexer that chooses between the real control values and 0’s. Assume that 0’s are benign values in datapath: nothing changes

Adding the Hazard Hardware Read Address Instruction Memory Add PC 4 1 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Data 1 Data 2 16 32 ALU Shift left 2 Data IF/ID Sign Extend ID/EX EX/MEM MEM/WB Control cntrl Branch PCSrc Forward Unit Hazard Unit 1 For class handout

Adding the Hazard Detection Unit Hardware Read Address Instruction Memory Add PC 4 1 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Data 1 Data 2 16 32 ALU Shift left 2 Data IF/ID Sign Extend ID/EX EX/MEM MEM/WB Control cntrl Branch PCSrc Forward Unit ID/EX.MemRead Hazard Unit ID/EX.RegisterRt 1 In reality, only the signals RegWrite and MemWrite need to be 0, the other control signals can be don’t cares.