Presentation is loading. Please wait.

Presentation is loading. Please wait.

COMP381 by M. Hamdi 1 (Recap) Control Hazards. COMP381 by M. Hamdi 2 Control (Branch) Hazard A: beqz r2, label B: --------- label: P: Problem: The outcome.

Similar presentations


Presentation on theme: "COMP381 by M. Hamdi 1 (Recap) Control Hazards. COMP381 by M. Hamdi 2 Control (Branch) Hazard A: beqz r2, label B: --------- label: P: Problem: The outcome."— Presentation transcript:

1 COMP381 by M. Hamdi 1 (Recap) Control Hazards

2 COMP381 by M. Hamdi 2 Control (Branch) Hazard A: beqz r2, label B: --------- label: P: Problem: The outcome of the branch (taken/not taken) is determined deep in the pipeline Do not know whether to execute B or execute P? Solutions: Stall pipeline for required number of cycles Methods to reduce the branch penalty: 1. Speculatively predict branch outcome and recover if mispredicted 2. Delayed Branch: Always execute instruction(s) following the branch

3 COMP381 by M. Hamdi 3 Branch Hazards: Stall pipeline MEM 23456 IDEXWB 71 IF 8 A IDEXMEM BPBP 9 WB A: beqz r2, label stall stall stall B ----- label: P Execution sequence: A, NOP, NOP, NOP, B ….. or A, NOP, NOP, NOP, P

4 COMP381 by M. Hamdi 4 Reducing Branch Penalty 1. Stall scheme had a branch penalty of 3 cycles. CPI assuming branch frequency of 30% is 1.9 (with no data stalls) 2. Predicting branch not taken: Speculatively fetch and execute in-line instructions following the branch If prediction incorrect flush pipeline of speculated instructions Convert these instructions to NOPs by clearing pipeline registers. Fetch instructions from the branch target

5 COMP381 by M. Hamdi 5 Predict branch not taken IFEXID A MEMWB B T = 1 IFEXID B MEMWB C A T = 2 IFEXID C MEMWB D BA T = 3 IFEXID D MEMWB E CBA T = 4 Branch actually not taken: No stalls

6 COMP381 by M. Hamdi 6 Predict branch not taken IFEXID A MEMWB B T = 1 IFEXID B MEMWB C A T = 2 IFEXID C MEMWB D BA T = 3 IFEXIDMEMWB P A T = 4 Branch actually taken:FLUSH pipeline Make B,C,D NOPS

7 COMP381 by M. Hamdi 7 Reducing Branch Stall Cycles 1- Find out whether a branch is taken earlier in the pipeline. 2- Compute the taken PC earlier in the pipeline. In MIPS: –In MIPS branch instructions BEQZ, BNE, test a register for equality to zero. –This can be completed in the ID cycle by moving the zero test into that cycle. –Both PCs (taken and not taken) must be computed early. –Requires an additional adder because the current ALU is not useable until EX cycle. single cycle stall –This results in just a single cycle stall on branches.

8 COMP381 by M. Hamdi 8 Reducing Branch Penalty ( contd … ) 3. Predicting branch taken: Speculatively fetch and execute instructions at the branch target –But haven’t calculated branch target address in MIPS MIPS still incurs 1 cycle branch penalty Other machines: branch target known before outcome

9 COMP381 by M. Hamdi 9 Reducing Branch Penalty (contd … ) 4. Delayed Branch: Branch delay slots: In-line instructions following a branch instruction conditional branch instruction sequential successor 1 sequential successor 2 …….. sequential successor n branch target if taken Always executed: Usually 1 delay slot (short pipelines) Compiler tries to put useful instructions in delay slot to mask delay

10 COMP381 by M. Hamdi 10 Delayed Branch Instruction in branch delay slot is always executed Compiler (tries to) move a useful instruction into delay slot. (a)From before the Branch: Always helpful when possible ADD R1, R2, R3 BEQZ R2, L1BEQZR2, L1 DELAY SLOTADD R1, R2, R3-L1: If the ADD instruction were: ADD R2, R1, R3 the move would not be possible

11 COMP381 by M. Hamdi 11 Delayed Branch (b) From the Target: Helps when branch is taken. May duplicate instructions ADD R2, R1, R3ADDR2, R1, R3 BEQZ R2, L1BEQZR2, L2 DELAY SLOTSUB R4, R5, R6- L1: SUB R4, R5, R6L1:SUB R4, R5, R6 L2:L2: Instructions between BEQ and SUB (in fall through) must not use R4.

12 COMP381 by M. Hamdi 12 Delayed Branch ( c ) From Fall Through: Helps when branch is not taken. ADD R2, R1, R3ADDR2, R1, R3 BEQZ R2, L1BEQZR2, L1 DELAY SLOTSUB R4, R5, R6 SUB R4, R5, R6 - - L1: Instructions at target (L1 and after) must not use R4 till set again. Cancelling (Nullifying) Branch: Branch instruction indicates direction of prediction. If mispredicted the instruction in the delay slot is cancelled. Greater flexibility for compiler to schedule instructions.

13 COMP381 by M. Hamdi 13 Canceling branch Cancelling (Nullifying) Branch: –Branch instruction indicates direction of prediction. –Mispredicted branch “annuls” the instruction in delay slot. Instruction behaves as a no-op. –Gives more leverage to compiler to select instructions to fill delay slots.

14 COMP381 by M. Hamdi 14 Delayed Branch Limitations of delayed branch –Compiler may not find appropriate instructions to fill delay slots. Then it fills delay slots with no- ops. –Visible architectural feature – likely to change with new implementations Pipeline structure is exposed to compiler. Need to know how many delay slots.

15 COMP381 by M. Hamdi 15 Delayed Branch Compiler effectiveness for single branch delay slot: –Fills about 60% of branch delay slots –About 80% of instructions executed in branch delay slots useful in computation –About 50% (60% x 80%) of slots usefully filled Delayed Branch downside: As processor go to deeper pipelines and multiple issue, the branch delay grows and need more than one delay slot –Delayed branching has lost popularity compared to more expensive but more flexible dynamic approaches –Growth in available transistors has made dynamic approaches relatively cheaper

16 COMP381 by M. Hamdi 16 Dynamic Branch Prediction Builds on the premise that history matters –Observe the behavior of branches in previous instances and try to predict future branch behavior –Try to predict the outcome of a branch early on in order to avoid stalls –Branch prediction is critical for multiple issue processors In an n-issue processor, branches will come n times faster than a single issue processor

17 COMP381 by M. Hamdi 17 Basic Branch Predictor Use a 1-bit branch predictor buffer or branch history table 1 bit of memory stating whether the branch was recently taken or not Bit entry updated each time the branch instruction is executed NT State 0 Predict Not Taken State 1 Predict Taken TNT T

18 COMP381 by M. Hamdi 18 1-bit Branch Prediction Buffer  Problem – even simplest branches are mispredicted twice LD R1, #5 Loop: LD R2, 0(R5) ADD R2, R2, R4 STORE R2, 0(R5) ADD R5, R5, #4 SUB R1, R1, #1 BNEZ R1, Loop First time: prediction = 0 but the branch is taken  change prediction to 1 miss Time 2, 3, 4: prediction = 1 and the branch is taken Time 5: prediction = 1 but the branch is not taken  change prediction to 0 miss

19 COMP381 by M. Hamdi 19 Dynamic Branch Prediction Accuracy


Download ppt "COMP381 by M. Hamdi 1 (Recap) Control Hazards. COMP381 by M. Hamdi 2 Control (Branch) Hazard A: beqz r2, label B: --------- label: P: Problem: The outcome."

Similar presentations


Ads by Google