Presentation is loading. Please wait.

Presentation is loading. Please wait.

Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)

Similar presentations


Presentation on theme: "Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)"— Presentation transcript:

1 Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6) http://www.ecs.umass.edu/ece/ece232/

2 ECE232: BrPredict 2 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren Branch Instructions Cause Control Hazards I n s t r. O r d e r lw Inst 4 Inst 3 beq ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg FDEXMW FD MW jr

3 ECE232: BrPredict 3 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren BEQ resolved during the MEM stage PCSrc Read Address Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 1632 ALU Shift left 2 Add Data Memory Address Write Data Read Data IF/ID Sign Extend ID/EX EX/MEM MEM/WB Control Branch

4 ECE232: BrPredict 4 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren stall One Way to “Fix” a Control Hazard I n s t r. O r d e r beq ALU IM Reg DMReg lw ALU IM Reg DMReg ALU Inst 3 IM Reg DM Fix branch hazard by waiting – introduce stalls

5 ECE232: BrPredict 5 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren Reducing branch penalty through HW design

6 ECE232: BrPredict 6 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren Reducing Control Hazards’ Penalties  Stalls – hurts performance  Deeper pipelines have higher penalties  1. Move decision point as early in the pipeline as possible – reduces number of stalls at the cost of additional hardware  2. Delay decision (requires compiler support) – “Delayed Branch”: not effective for deeper pipes - requiring more than one delay slot to be filled  3. Predict outcome of branch beq $1,$2,NEXT add $4,$3,$5 sub $7,$2,$8  NEXT

7 ECE232: BrPredict 7 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren Branch Prediction  Easiest - static prediction Always taken, always not taken Opcode based Displacement based (forward not taken, backward taken) Compiler directed (branch likely, branch not likely)  Dynamic prediction – prediction per branch in program 1 bit predictor – remember last taken/not taken per branch Use a branch-history table (BHT) with 1 bit entry Use part of the PC (low-order bits) to index table – Why? Multiple branches may share the same bit Invert the bit if prediction is wrong Predictor 0 Predictor 127 Predictor 1 Branch PC BHT

8 ECE232: BrPredict 8 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren Branch Prediction  1 bit predictor Backward branches for loops will be mispredicted twice EX: If a loop branches 9 times in a row and not taken once, what is the prediction accuracy? Misprediction at the first and last loop iteration => 80% prediction accuracy, although branch is taken 90%  Modern processors – multiple instructions issued per cycle, more branch hazards will occur per cycle Cost of branch mispredicted goes up Pentium II – 3 instructions issued per cycle, 12+ cycle misprediction penalty Huge penalty for a misfetched path following a branch T... TTT T N TT... N

9 ECE232: BrPredict 9 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren 2-bit Branch Prediction  4 states instead of 2, allowing for more information about tendencies  A prediction must miss twice before it is changed  Good for backward branches of loops  2-bit saturating counter T T N T N T N N Predict Taken Predict not taken T... TTT T T TT... N

10 ECE232: BrPredict 10 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren Branch History Table - BHT 01 BHT branch PC  2 bits by N (e.g. 4K entries)  Uses low-order bits of branch PC to choose entry  Plot misprediction instead of prediction Predictor 0 Predictor 4095 Predictor 1 01

11 ECE232: BrPredict 11 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren Is Branch Predictor Enough?  When is using branch prediction beneficial? Clearly when the outcome is known later than the target Otherwise - If we predict the branch is taken (and suppose it is correct), what is the target address? Need a mechanism to provide target address as well Use a Branch Target Buffer (BTB) that includes the target address  Can we eliminate the one cycle delay for the 5-stage pipeline? Need to fetch from branch target immediately after branch was fetched

12 ECE232: BrPredict 12 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren Branch Target Buffer (BTB) BTB is a cache that contains the predicted PC value instead of whether the branch will take place or not (Ex. Loop address) Is the current instruction a branch ? BTB provides the answer before the current instruction is decoded and therefore enables fetching to begin after IF-stage (for branch) What is the branch target ? BTB provides the branch target if the prediction is a taken branch (for not taken branches the target is simply PC+4 )

13 ECE232: BrPredict 13 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren BTB

14 ECE232: BrPredict 14 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren BTB operations  BTB hit, prediction taken → 0 cycle delay  BTB hit, misprediction ≥ 2 cycle penalty – Correct BTB  BTB miss, branch ≥ 1 cycle penalty (Detected at the ID stage and entered in BTB) Taken Branch? Entry found in branch- target buffer? Send out predicted PC Is instruction a taken branch? Send PC to memory and branch-target buffer Enter branch instruction address and next PC into branch-target buffer Mispredicted branch, kill fetched instruction; restart fetch at other target; update target buffer Normal instruction execution Branch correctly predicted; continue execution with no stalls No Yes No ID IF EX

15 ECE232: BrPredict 15 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren Branch Prediction Summary  The better we predict, the lower penalty we might incur  2-bit predictors capture tendencies well  Correlating predictors improve accuracy, particularly when combined with 2-bit predictors  Accurate branch prediction does no good if we don’t know there was a branch to predict  BTB identifies branches in IF stage  BTB combined with branch prediction table identifies branches to predict, and predicts them well


Download ppt "Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)"

Similar presentations


Ads by Google