Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pipelining - Hazards. Can Pipelining Get Us Into Trouble? Yes: Pipeline Hazards  Structural hazards: attempt to use the same resource two different ways.

Similar presentations


Presentation on theme: "Pipelining - Hazards. Can Pipelining Get Us Into Trouble? Yes: Pipeline Hazards  Structural hazards: attempt to use the same resource two different ways."— Presentation transcript:

1 Pipelining - Hazards

2 Can Pipelining Get Us Into Trouble? Yes: Pipeline Hazards  Structural hazards: attempt to use the same resource two different ways at the same time E.g., combined washer/dryer would be a structural hazard or folder busy doing something else (watching TV)  Control hazards: attempt to make a decision before condition is evaluated E.g., washing football uniforms and need to get proper detergent level; need to see after dryer before next load in Branch instructions  Data hazards: attempt to use item before it is ready E.g., one sock of pair in dryer and one in washer; can’t fold until get sock from washer through dryer Instruction depends on result of prior instruction still in the pipeline

3 Structural Hazard A relation between two instructions indicating that the two instructions may want to use the same hardware resource (function unit, register file port, shared bus, cache port, etc.) at the same time MIPS pipeline as designed so far does not have structural hazard  But we had to avoid it  Usually occurs when a functional unit is not fully pipelined (e.g., in floating point pipeline)

4 Single Memory Port / Structural Hazard I n s t r. O r d e r Time (clock cycles) Load Instr 1 Instr 2 Instr 3 Instr 4 Cycle 1Cycle 2Cycle 3Cycle 4Cycle 6Cycle 7Cycle 5 Reg ALU RegIfetch DMem Reg ALU RegIfetchDMem Reg ALU RegIfetchDMem Reg ALU Reg Ifetch DMem Reg ALU RegIfetchDMem

5 Single Memory Port / Structural Hazard I n s t r. O r d e r Time (clock cycles) Load Instr 1 Instr 2 Stall Instr 3 Reg ALU DMem Ifetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg Cycle 1Cycle 2Cycle 3Cycle 4Cycle 6Cycle 7Cycle 5 Reg ALU DMemIfetch Reg Bubble How do you “bubble” the pipe?

6 Single Memory Port / Structural Hazard Instead of stalling the pipeline Other solutions  Make dual ported memory  Physically separate memory architecture into instruction and data (Harvard Architecture from Harvard Mark I project of IBM led by Dr. Howard Aiken) Another typical structural hazard  Functional unit is not fully pipelined due to cost/complexity  Pipeline interval > 1 pipe stage

7 Example: Cost of Structural Hazard Suppose that 40% of instruction mix are loads or stores, and that the ideal CPI of the pipelined machine is 1. Assume that the machine with the structural hazard has a clock rate that is 5% higher than the clock rate of the machine without the hazard. Which pipeline is faster, and by how much? Suppose that 40% of instruction mix are loads or stores, and that the ideal CPI of the pipelined machine is 1. Assume that the machine with the structural hazard has a clock rate that is 5% higher than the clock rate of the machine without the hazard. Which pipeline is faster, and by how much?

8 Data Hazards I n s t r. O r d e r add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11 Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg

9 Three Generic Data Hazards True (or Flow) Dependency (Read After Write, or RAW)  A later instruction tries to read operand before earlier instructions write it I: add r1,r2,r3 J: sub r4,r1,r3

10 RAW Hazards True (value, flow) dependence between instructions i and j means i produces a result value that j uses  This is a producer-consumer relationship  This is a dependence based on values, not on the names of the containers of the values Every true dependence is a RAW hazard Not every RAW hazard is a true dependence  Any RAW hazard that cannot be removed by renaming is a true dependence Original program 1: A = B+C 2: A = D+E 3: G = A+H True dependence: (2,3) RAW hazard: (1,3), (2,3) Renamed Program 1: X = B+C 2: A = D+E 3: G = A+H True dependence: (2,3) RAW hazard: (2,3)

11 Three Generic Data Hazards Anti-Dependency (Write After Read, or WAR)  A later instruction tries to write operand before earlier instructions read it This hazard results from reuse of the same register Can’t happen in our simple 5 stage pipeline because:  All instructions take 5 stages, and  Reads are always in stage 2, and  Writes are always in stage 5 I: add r2, r1,r3 J: sub r1,r4,r3

12 Three Generic Data Hazards Output Dependency (Write After Write, or WAW)  A later instruction tries to write operand before earlier instructions write it This hazard results from reuse of the same register Can’t happen in our simple 5 stage pipeline because:  All instructions take 5 stages, and  Reads are always in stage 2, and  Writes are always in stage 5 I: add r1,r2,r3 J: sub r1,r4,r3

13 More on WAR and WAW WAR and WAW hazards are name dependences  Two instructions happen to use the same register (name), although they don’t have to  Can often be eliminated by renaming, either in software or hardware Implies the use of additional resources, hence additional cost Renaming is not always possible: implicit operands such as accumulator, PC, or condition codes cannot be renamed

14 How to Break the Dependency Dependency reduces concurrency Can we break  True dependency (RAW)  Name dependency or False dependency (WAR, WAW)

15 Have compiler guarantee no hazards Where do we insert the “nops” ? sub$2, $1, $3 and $12, $2, $5 or$13, $6, $2 add$14, $2, $2 sw$15, 100($2) Problem: this really slows us down! Software Solution

16 Hardware Solution: Forwarding Time (clock cycles) InstrOrderInstrOrder add r1,r2,r3 sub r4,r1,r3 and r6,r1,r7 or r8,r1,r9 xor r10,r1,r11 Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg

17 Forwarding (simplified) Data Memory Register File MUX ID/EXEX/MEMMEM/WB ALU

18 Forwarding Unit 1. Forwarding between ALUOut and ALUMuxA sub $2, $1, $3 and $12, $2, $5 EX/MEM.RegisterRd = ID/EX.RegisterRs = $2 => Use EX/MEM.ALUOut instead of ID/EX.A a. Some instructions do not write registers b. Every use of $0 as an operand must yield an operand value of zero If ( EX/MEM.RegWrite & (EX/MEM.RegisterRd ≠ 0) & (EX/MEM.RegisterRd = ID/EX.RegisterRs) ) ForwardA= 01

19 Forwarding Unit 2. Forwarding between ALUOut and ALUMuxB sub $2, $1, $3 and $12,$5, $2 EX/MEM.RegisterRd = ID/EX.RegisterRt = $2 => Use EX/MEM.ALUOut instead of ID/EX.B If ( EX/MEM.RegWrite & (EX/MEM.RegisterRd ≠ 0) & (EX/MEM.RegisterRd = ID/EX.RegisterRt) ) ForwardB= 01

20 Forwarding (from EX/MEM) ALU Data Memory Register File MUX ID/EXEX/MEMMEM/WB MUX

21 Forwarding Unit 3. Forwarding between ALUOut and ALUMuxA sub $2, $1, $3 and $12, $2, $5 or $13, $2, $6 MEM/WB.RegisterRd = MEM/WB.RegisterRs = $2 => Use MEM/WB.ALUOut instead of ID/EX.A If ( MEM/WB.RegWrite & (MEM/WB.RegisterRd ≠ 0) & (MEM/WB.RegisterRd = ID/EX.RegisterRs) ) ForwardA= 10

22 Forwarding Unit 4. Forwarding between ALUOut and ALUMuxB sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 MEM/WB.RegisterRd = MEM/WB.RegisterRt = $2 => Use MEM/WB.ALUOut instead of ID/EX.B If ( MEM/WB.RegWrite & (MEM/WB.RegisterRd ≠ 0) & (MEM/WB.RegisterRd = ID/EX.RegisterRt) ) ForwardB= 10

23 Forwarding (from MEM/WB) ALU Data Memory Register File MUX ID/EXEX/MEMMEM/WB MUX

24 Forwarding (operand selection) ALU Data Memory Register File MUX ID/EXEX/MEMMEM/WB MUX Forwarding Unit

25 Forwarding (operand propagation) ALU Data Memory Register File MUX ID/EXEX/MEMMEM/WB MUX Forwarding Unit Rt Rs MUX Rd Rt EX/MEM Rd MEM/WB Rd

26 Forwarding

27 Datapath with Forwarding Unit

28 Forwarding Unit add $1, $1, $2 add $1, $1, $3 add $1, $1, $4 If ( MEM/WB.RegWrite & (MEM/WB.RegisterRd ≠ 0) & (EX/MEM.RegisterRd ≠ ID/EX.RegisterRs) (MEM/WB.RegisterRd = ID/EX.RegisterRs) ) ForwardA= 10 If ( MEM/WB.RegWrite & (MEM/WB.RegisterRd ≠ 0) & (EX/MEM.RegisterRd ≠ ID/EX.RegisterRt) (MEM/WB.RegisterRd = ID/EX.RegisterRt) ) ForwardB= 10

29 Some Other Data Dependencies  add $1, $1, $2 F | D | X | M | W sw $7, 0($1) F | D | X | M | W sw $8, 0($1) F | D | X | M | W sw $9, 0($1) F | D | X | M | W  add $1, $1, $2 F | D | X | M | W sw $1, 0($7) F | D | X | M | W sw $1, 0($8) F | D | X | M | W sw $1, 0($9) F | D | X | M | W  lw $1, 0($2) F | D | X | M | W sw $1, 0($7) F | D | X | M | W sw $1, 0($8) F | D | X | M | W sw $1, 0($9) F | D | X | M | W

30 Load word can still cause a hazard Can't always forward Time (clock cycles) I n s t r. O r d e r lw r1, 0(r2) sub r4,r1,r6 and r6,r1,r7 or r8,r1,r9 Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg

31 Data Hazard Even with Forwarding Time (clock cycles) or r8,r1,r9 I n s t r. O r d e r lw r1, 0(r2) sub r4,r1,r6 and r6,r1,r7 Reg ALU DMemIfetch Reg Ifetch ALU DMem Reg Bubble Ifetch ALU DMem Reg Bubble Reg Ifetch ALU DMem Bubble Reg NO ISSUE Thus, we need a hazard detection unit to “stall” the load instruction

32 Stalling Hazard detection unit: When the pipeline is stalled:  Do not fetch a new instruction: Prevent PC and IF/ID registers from changing  Create a “buble” in the pipeline: Set all control signals to 0 to create a “do nothing” instruction If ( ID/EX.MemRead & ((ID/EX.RegisterRt = IF/ID.RegisterRs) | (ID/EX.RegisterRt = IF/ID.RegisterRt) )) stall the pipeline

33 Hazard Detection Unit PC Instruction memory Registers M u x M u x M u x Control ALU EX M WB M WB WB ID/EX EX/MEM MEM/WB Data memory M u x Hazard detection unit Forwarding unit 0 M u x IF/ID I n s t r u c t i o n ID/EX.MemRead I F / I D W r i t e P C W r i t e ID/EX.RegisterRt IF/ID.RegisterRd IF/ID.RegisterRt IF/ID.RegisterRt IF/ID.RegisterRs Rt Rs Rd Rt EX/MEM.RegisterRd MEM/WB.RegisterRd

34 Code rescheduling to Avoid Load Hazards Try producing fast code for a = b + c; d = e – f; assuming a, b, c, d,e, and f in memory. Slow code: LW Rb,b LW Rc,c ADD Ra,Rb,Rc SW a,Ra LW Re,e LW Rf,f SUB Rd,Re,Rf SWd,Rd Compiler optimizes for performance. Hardware checks for safety. Fast code: LW Rb,b LW Rc,c LW Re,e ADD Ra,Rb,Rc LW Rf,f SW a,Ra SUB Rd,Re,Rf SWd,Rd

35 Branch in the Pipelined Datapath Instruction memory Address Add Add result Shift left 2 I n s t r u c t i o n IF/IDEX/MEMMEM/WB M u x 0 1 Add PC 0 Address Write data M u x 1 Registers Read data 1 Read data 2 Read register 1 Read register 2 16 Sign extend Write register Write data Read data Data memory 1 ALU result M u x ALU Zero ID/EX Computes branch target address Computes branch outcome Changes PC

36 When we decide to branch, other instructions are in the pipeline! Branch (Control) Hazards 10: beq r1,r3,36 14: and r2,r3,r5 18: or r6,r1,r7 22: add r8,r1,r9 36: xor r10,r1,r11 Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch Reg ALU DMemIfetch

37 Stall the pipeline until the branch is complete  Brach is detected in ID stage  Pipeline is stalled  Pipeline is started in IF stage Next instruction Branch target  Three clock cycles will be lost for each branch !!! Solving Branch Hazards

38 Reducing Taken Branch Penalty Compute branch target address earlier Compute branch outcome earlier

39 Reducing Taken Branch Penalty Branch is completed in ID stage If branch is taken, flush the pipeline  1 cycle loss for a taken branch Taken branchFDXMW Branch + 1FFL Branch targetFDXMW BT + 1FDXMW

40 Flushing the Instruction After Branch

41 Continue execution after the branch If branch is not taken, no penalty If branch is taken, flush the pipeline and loss of 1 clock cycles Predict–not-Taken (Predict-Untaken) What about Predict-Taken?

42 Delayed Branches Execution cycle with a branch delay of length n: branch instruction sequential successor 1 sequential successor sequential successor n branch target if taken Instructions in the branch delay slot are executed irrespective of branch outcome Branch delay of length n

43 Delayed Branches on MIPS One branch delay slot on MIPS  Taken and untaken branch behaviour are similar  Compiler must fill in the branch delay slot with useful instructions

44 Delayed Branches Question: What instruction do we put in the branch delay slot?  Fill with NOP (always possible)  Fill from before (not always possible)  Fill from target (not always possible)  Fill from fall-through (not always possible)

45 Filling Branch Delay Slot Make sure R7 will not be used in taken path before redefined

46 Filling Branch Delay Slot

47 Cancelling Branches Improves the ability of the compiler to fill in delay slots Instruction includes a bit showing its predicted direction When branch behaves as predicted, instruction in the delay slot is executed When branch is incorrectly predicted, instruction in the delay slot is turned to NOP

48 Predict-Taken Cancelling Branch

49 Summary: Pipelining Reduce CPI by overlapping many instructions  Average throughput of approximately 1 CPI with fast clock Utilize capabilities of the Datapath  Start next instruction while working on the current one  Limited by length of longest stage (plus fill/flush)  Detect and resolve hazards What makes it easy  All instructions are the same length  Just a few instruction formats  Memory operands appear only in loads and stores What makes it hard?  Structural hazards: suppose we had only one memory  Control hazards: need to worry about branch instructions  Data hazards: an instruction depends on a previous instruction


Download ppt "Pipelining - Hazards. Can Pipelining Get Us Into Trouble? Yes: Pipeline Hazards  Structural hazards: attempt to use the same resource two different ways."

Similar presentations


Ads by Google