Download presentation
1
Pipelining - Hazards
2
Can Pipelining Get Us Into Trouble?
Yes: Pipeline Hazards Structural hazards: attempt to use the same resource two different ways at the same time E.g., combined washer/dryer would be a structural hazard or folder busy doing something else (watching TV) Control hazards: attempt to make a decision before condition is evaluated E.g., washing football uniforms and need to get proper detergent level; need to see after dryer before next load in Branch instructions Data hazards: attempt to use item before it is ready E.g., one sock of pair in dryer and one in washer; can’t fold until get sock from washer through dryer Instruction depends on result of prior instruction still in the pipeline
3
Structural Hazard A relation between two instructions indicating that the two instructions may want to use the same hardware resource (function unit, register file port, shared bus, cache port, etc.) at the same time MIPS pipeline as designed so far does not have structural hazard But we had to avoid it Usually occurs when a functional unit is not fully pipelined (e.g., in floating point pipeline)
4
Single Memory Port / Structural Hazard
Time (clock cycles) Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Reg ALU Ifetch DMem I n s t r. O r d e Load Reg ALU Ifetch DMem Instr 1 Instr 2 Reg ALU Ifetch DMem Instr 3 Reg ALU Ifetch DMem Instr 4 Reg ALU Ifetch DMem
5
Single Memory Port / Structural Hazard
Time (clock cycles) Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 ALU I n s t r. O r d e Load Ifetch Reg DMem Reg Reg ALU DMem Ifetch Instr 1 Reg ALU DMem Ifetch Instr 2 Bubble Stall Reg ALU DMem Ifetch Instr 3 How do you “bubble” the pipe?
6
Single Memory Port / Structural Hazard
Instead of stalling the pipeline Other solutions Make dual ported memory Physically separate memory architecture into instruction and data (Harvard Architecture from Harvard Mark I project of IBM led by Dr. Howard Aiken) Another typical structural hazard Functional unit is not fully pipelined due to cost/complexity Pipeline interval > 1 pipe stage
7
Example: Cost of Structural Hazard
Suppose that 40% of instruction mix are loads or stores, and that the ideal CPI of the pipelined machine is 1. Assume that the machine with the structural hazard has a clock rate that is 5% higher than the clock rate of the machine without the hazard. Which pipeline is faster, and by how much?
8
Data Hazards add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9
I n s t r. O r d e add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11 Reg ALU DMem Ifetch
9
Three Generic Data Hazards
True (or Flow) Dependency (Read After Write, or RAW) A later instruction tries to read operand before earlier instructions write it I: add r1,r2,r3 J: sub r4,r1,r3
10
RAW Hazards True (value, flow) dependence between instructions i and j means i produces a result value that j uses This is a producer-consumer relationship This is a dependence based on values, not on the names of the containers of the values Every true dependence is a RAW hazard Not every RAW hazard is a true dependence Any RAW hazard that cannot be removed by renaming is a true dependence Original program 1: A = B+C 2: A = D+E 3: G = A+H Renamed Program 1: X = B+C 2: A = D+E 3: G = A+H True dependence: (2,3) RAW hazard: (2,3) True dependence: (2,3) RAW hazard: (1,3), (2,3)
11
Three Generic Data Hazards
Anti-Dependency (Write After Read, or WAR) A later instruction tries to write operand before earlier instructions read it This hazard results from reuse of the same register Can’t happen in our simple 5 stage pipeline because: All instructions take 5 stages, and Reads are always in stage 2, and Writes are always in stage 5 I: add r2, r1,r3 J: sub r1,r4,r3
12
Three Generic Data Hazards
Output Dependency (Write After Write, or WAW) A later instruction tries to write operand before earlier instructions write it This hazard results from reuse of the same register Can’t happen in our simple 5 stage pipeline because: All instructions take 5 stages, and Reads are always in stage 2, and Writes are always in stage 5 I: add r1,r2,r3 J: sub r1,r4,r3
13
More on WAR and WAW WAR and WAW hazards are name dependences
Two instructions happen to use the same register (name), although they don’t have to Can often be eliminated by renaming, either in software or hardware Implies the use of additional resources, hence additional cost Renaming is not always possible: implicit operands such as accumulator, PC, or condition codes cannot be renamed
14
How to Break the Dependency
Dependency reduces concurrency Can we break True dependency (RAW) Name dependency or False dependency (WAR, WAW)
15
Software Solution Have compiler guarantee no hazards
Where do we insert the “nops” ? sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2) Problem: this really slows us down!
16
Hardware Solution: Forwarding
Time (clock cycles) Reg ALU DMem Ifetch add r1,r2,r3 I n s t r O d e sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11
17
Forwarding (simplified)
ID/EX EX/MEM MEM/WB Register File ALU Data Memory MUX
18
Forwarding Unit 1. Forwarding between ALUOut and ALUMuxA
sub $2, $1, $3 and $12, $2, $5 EX/MEM.RegisterRd = ID/EX.RegisterRs = $2 => Use EX/MEM.ALUOut instead of ID/EX.A a. Some instructions do not write registers b. Every use of $0 as an operand must yield an operand value of zero If ( EX/MEM.RegWrite & (EX/MEM.RegisterRd ≠ 0) & (EX/MEM.RegisterRd = ID/EX.RegisterRs) ) ForwardA= 01
19
Forwarding Unit 2. Forwarding between ALUOut and ALUMuxB
sub $2, $1, $3 and $12,$5, $2 EX/MEM.RegisterRd = ID/EX.RegisterRt = $2 => Use EX/MEM.ALUOut instead of ID/EX.B If ( EX/MEM.RegWrite & (EX/MEM.RegisterRd ≠ 0) & (EX/MEM.RegisterRd = ID/EX.RegisterRt) ) ForwardB= 01
20
Forwarding (from EX/MEM)
ID/EX EX/MEM MEM/WB MUX Register File ALU Data Memory MUX MUX
21
Forwarding Unit 3. Forwarding between ALUOut and ALUMuxA
sub $2, $1, $3 and $12, $2, $5 or $13, $2, $6 MEM/WB.RegisterRd = MEM/WB.RegisterRs = $2 => Use MEM/WB.ALUOut instead of ID/EX.A If ( MEM/WB.RegWrite & (MEM/WB.RegisterRd ≠ 0) & (MEM/WB.RegisterRd = ID/EX.RegisterRs) ) ForwardA= 10
22
Forwarding Unit 4. Forwarding between ALUOut and ALUMuxB
sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 MEM/WB.RegisterRd = MEM/WB.RegisterRt = $2 => Use MEM/WB.ALUOut instead of ID/EX.B If ( MEM/WB.RegWrite & (MEM/WB.RegisterRd ≠ 0) & (MEM/WB.RegisterRd = ID/EX.RegisterRt) ) ForwardB= 10
23
Forwarding (from MEM/WB)
ID/EX EX/MEM MEM/WB MUX Register File ALU Data Memory MUX MUX
24
Forwarding (operand selection)
ID/EX EX/MEM MEM/WB MUX Register File ALU Data Memory MUX MUX Forwarding Unit
25
Forwarding (operand propagation)
ID/EX EX/MEM MEM/WB MUX Register File ALU Data Memory MUX MUX Rd MUX Rt EX/MEM Rd Rt Forwarding Unit Rs MEM/WB Rd
26
Forwarding
27
Datapath with Forwarding Unit
C I n s t r u c i o m e y R g M x l A L U E X W B D / a F w d .
28
Forwarding Unit add $1, $1, $2 add $1, $1, $3 add $1, $1, $4
If ( MEM/WB.RegWrite & (MEM/WB.RegisterRd ≠ 0) & (EX/MEM.RegisterRd ≠ ID/EX.RegisterRs) (MEM/WB.RegisterRd = ID/EX.RegisterRs) ) ForwardA= 10 If ( MEM/WB.RegWrite & (MEM/WB.RegisterRd ≠ 0) & (EX/MEM.RegisterRd ≠ ID/EX.RegisterRt) (MEM/WB.RegisterRd = ID/EX.RegisterRt) ) ForwardB= 10
29
Some Other Data Dependencies
add $1, $1, $ F | D | X | M | W sw $7, 0($1) F | D | X | M | W sw $8, 0($1) F | D | X | M | W sw $9, 0($1) F | D | X | M | W add $1, $1, $ F | D | X | M | W sw $1, 0($7) F | D | X | M | W sw $1, 0($8) F | D | X | M | W sw $1, 0($9) F | D | X | M | W lw $1, 0($2) F | D | X | M | W sw $1, 0($7) F | D | X | M | W sw $1, 0($8) F | D | X | M | W sw $1, 0($9) F | D | X | M | W
30
Can't always forward Load word can still cause a hazard lw r1, 0(r2)
Time (clock cycles) Reg ALU DMem Ifetch I n s t r. O r d e lw r1, 0(r2) sub r4,r1,r6 and r6,r1,r7 or r8,r1,r9
31
Data Hazard Even with Forwarding
Time (clock cycles) I n s t r. O r d e Reg ALU DMem Ifetch lw r1, 0(r2) NO ISSUE Reg Ifetch ALU DMem Bubble sub r4,r1,r6 Ifetch ALU DMem Reg Bubble and r6,r1,r7 Bubble Ifetch Reg ALU DMem or r8,r1,r9 Thus, we need a hazard detection unit to “stall” the load instruction
32
Stalling Hazard detection unit: When the pipeline is stalled:
If ( ID/EX.MemRead & ((ID/EX.RegisterRt = IF/ID.RegisterRs) | (ID/EX.RegisterRt = IF/ID.RegisterRt) )) stall the pipeline When the pipeline is stalled: Do not fetch a new instruction: Prevent PC and IF/ID registers from changing Create a “buble” in the pipeline: Set all control signals to 0 to create a “do nothing” instruction
33
Hazard Detection Unit P C I n s t r u c i o m e y R g M x l A L U E X
W B D / a H z d F w .
34
Code rescheduling to Avoid Load Hazards
Try producing fast code for a = b + c; d = e – f; assuming a, b, c, d ,e, and f in memory. Slow code: LW Rb,b LW Rc,c ADD Ra,Rb,Rc SW a,Ra LW Re,e LW Rf,f SUB Rd,Re,Rf SW d,Rd Fast code: LW Rb,b LW Rc,c LW Re,e ADD Ra,Rb,Rc LW Rf,f SW a,Ra SUB Rd,Re,Rf SW d,Rd Compiler optimizes for performance. Hardware checks for safety.
35
Branch in the Pipelined Datapath
Computes branch target address Computes branch outcome I n s t r u c i o m e y A d 4 3 2 l S h f F / D E X M W B x 1 P C a R g 6 L U Z Changes PC
36
Branch (Control) Hazards
When we decide to branch, other instructions are in the pipeline! Reg ALU DMem Ifetch 10: beq r1,r3,36 14: and r2,r3,r5 18: or r6,r1,r7 22: add r8,r1,r9 36: xor r10,r1,r11 Reg ALU DMem Ifetch Reg ALU DMem Ifetch ALU Ifetch Reg DMem Reg ALU Ifetch Reg DMem
37
Solving Branch Hazards
Stall the pipeline until the branch is complete Brach is detected in ID stage Pipeline is stalled Pipeline is started in IF stage Next instruction Branch target Three clock cycles will be lost for each branch !!!
38
Reducing Taken Branch Penalty
Compute branch target address earlier Compute branch outcome earlier
39
Reducing Taken Branch Penalty
Branch is completed in ID stage If branch is taken, flush the pipeline 1 cycle loss for a taken branch Taken branch F D X M W Branch + 1 FL Branch target BT + 1
40
Flushing the Instruction After Branch
P C I n s t r u c i o m e y 4 R g M x A L U E X W B D / a H z d F w . l h S = f 2
41
Predict–not-Taken (Predict-Untaken)
Continue execution after the branch If branch is not taken, no penalty If branch is taken, flush the pipeline and loss of 1 clock cycles What about Predict-Taken?
42
Delayed Branches Execution cycle with a branch delay of length n:
branch instruction sequential successor1 sequential successor sequential successorn branch target if taken Instructions in the branch delay slot are executed irrespective of branch outcome Branch delay of length n
43
Delayed Branches on MIPS
One branch delay slot on MIPS Taken and untaken branch behaviour are similar Compiler must fill in the branch delay slot with useful instructions
44
Delayed Branches Question: What instruction do we put in the branch delay slot? Fill with NOP (always possible) Fill from before (not always possible) Fill from target (not always possible) Fill from fall-through (not always possible)
45
Filling Branch Delay Slot
Make sure R7 will not be used in taken path before redefined
46
Filling Branch Delay Slot
47
Cancelling Branches Improves the ability of the compiler to fill in delay slots Instruction includes a bit showing its predicted direction When branch behaves as predicted, instruction in the delay slot is executed When branch is incorrectly predicted, instruction in the delay slot is turned to NOP
48
Predict-Taken Cancelling Branch
49
Summary: Pipelining Reduce CPI by overlapping many instructions
Average throughput of approximately 1 CPI with fast clock Utilize capabilities of the Datapath Start next instruction while working on the current one Limited by length of longest stage (plus fill/flush) Detect and resolve hazards What makes it easy All instructions are the same length Just a few instruction formats Memory operands appear only in loads and stores What makes it hard? Structural hazards: suppose we had only one memory Control hazards: need to worry about branch instructions Data hazards: an instruction depends on a previous instruction
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.