EECE476: Computer Architecture Lecture 20: Branch Prediction Chapter 6.6 + extra The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

2 Control Hazards Summary We reduced branch/jump penalty to 1 cycle Still have 2 remaining problems Utilization problem –We may fetch the wrong instruction(s) after branch/jump Option 1: stall after every branch/jump Option 2: nullify-if-branch-taken (small performance improvement) Option 3: declare as a “delay slot”, always-execute (avoid) Option 4: new strategies? Forwarding problem –We may depend on result of instruction(s) just before branch Option 1: stall when dependence detected (HDU) Option 2: forward when dependence detected (FDU)

3 New Strategy: Nullify-if-Not-Taken Previously: Nullify-if-taken –Instruction after branch (PC+4) “sneaks” into pipeline –Nullify if branch is taken (T) Observation –Branch has 2 outcomes: taken, not-taken (T or NT) What about nullify-if-NT? –This is another valid strategy –We “sneak” instruction from PC+4+OFFSET Differences? –Can we predict which outcome is more likely? T or NT? –If so, we can “sneak” the right instruction into the pipeline –Reduces frequency of nullify operations

4 Nullify-if-T vs. Nullify-if-NT These are 2 forms of static branch prediction: –Nullify-if-T:always predict NT is likely –Nullify-if-NT:always predict T is likely Main Idea –Predict target where branch is going (T or NT) –Put useful (target) instructions in pipeline after branch –Nullify only if we predict wrong Performance impact? –More accurate predictions  better performance

5 Implementing Static Branch Prediction Simplest static branch prediction –Predict backward branches T, forward branches NT Requires 0 instruction bits More sophisticated static branch prediction –Define two instruction types: BEQ-likely, BEQ-unlikely –For each individual branch, compiler decides if branch is likely or unlikely Requires 1 instruction bit in ISA to encode “likely-vs-unlikely” into each branch

6 Reducing Branch Pipeline Penalty Static Methods Summary 1.Always stall –Works well, wastes CPU cycles. 2.Always execute (delayed branch) –Requires useful instruction to be scheduled by compiler 3.Nullify-if-taken (always predicts branch is NT) –Fetch from PC+4, PC+8, etc –Half of branch-forward instructions are NT –Some performance benefit 4.Nullify-if-not-taken (always predicts branch is T) –Fetch from PC+4+OFFSET, PC+8+OFFSET, etc –Almost all branch-backward instructions are T –Big performance benefit

7 Reducing Branch Pipeline Penalty Dynamic Method 5.Nullify-if-mispredicted Dynamically predict T or NT To do this… –Need branch prediction –Predict direction based upon recent history –Must fetch from predicted direction (target address) Note: no correctness problems arise if we mispredict (only performance) Performance impact? –Depends on “prediction accuracy” –Want >= 80% to be useful Somehow, must implement in ISA –ISA may adopt one of more of above policies for branch instructions –ISA may also adopt multiple policies (eg, multiple versions of same branch instruction)

8 Dynamic Branch Prediction Dynamic: predicted branch direction depends upon recent history –No history? Must guess –Execute same branch many times  History  Need state information to retain history

9 Overview of Dynamic Branch Prediction Schemes Many Types of Dynamic Branch Predictors –Basic 1-bit predictor 2-bit predictor (very good) –Generalization N-bit saturating counter (not very good) –Hybrid/advanced (excellent) Correlating predictors Multilevel predictors –Perfect (prescient) predictor Non-causal, only works in simulation Used to measure effectiveness of other prediction schemes

10 Dynamic 1-bit Branch Prediction Basic scheme 1-bit predictor –Remembers most recent execution of branch Was it taken or not taken? –Assume same outcome next time –Where to store 1 bit? In the instruction encoding? 1 global bit (DFF) in the CPU? Visit this again later…

11 Dynamic 1-bit Branch Prediction 1-bit Predictor Example A = 0* initialize registers Loop: A = A + 1Loop:ADD $1,$1,$2 If A != 10 goto LoopBNE $1,$3, Loop Prediction Accuracy? Last iteration NT, so next time, first iteration assumes NT Result: 80% accuracy (20% mispredictions) PredictionOutcome Prediction Correct? Middle iterations TT8 correct Last iteration TNT1 wrong First iteration NTT1 wrong

12 Dynamic 2-bit Branch Prediction Two basic schemes Simple: 2-bit “saturating counter” predictor –Remember two most recent outcomes? History (prev,curr) –(T,T)  Predicts T –(NT,NT)  Predicts NT –(T,NT)  Predicts ? –(NT,T)  Predicts ? –Although a possibility, this scheme is not usually used Better: 2-bit “sequence” predictor –Mispredict twice before changing prediction

13 Dynamic 2-bit Sequence Prediction Saturating –Repeating T stays in ‘11’ state –Repeating NT stays in ‘00’ state Two-in-a-row to change prediction –(T,NT) won’t change prediction –(NT,T) won’t change prediction T 11 T 10 NT 01 NT 00 T NT T T T

14 Dynamic 2-bit Prediction Example 2-bit Predictor Example A = 0* initialize registers Loop: A = A + 1Loop:ADD $1,$1,$2 If A != 10 goto Loop:BNE $1,$3, Loop Prediction Accuracy? Last iteration is 1 st mispredict, so next time, 1 st iteration still predicts T Result: 90% accuracy (10% mispredictions) PredictionOutcome Prediction Correct? Middle iterations TT8 correct Last iteration TNT 1 wrong, but next prediction still T First iteration TT1 correct

15 Dynamic 2-bit Prediction Results Effectiveness? Mispredictions in SPEC89 with 4096-entry branch prediction table: –Nasa7: 1% –Matrix300: 0% –Tomcatv: 1% –Doduc: 5% –Spice: 9% –FPPPP: 9% –Gcc:12% –Espresso: 5% –Eqntott:18% –Li:10% About 90% effective!

16 Dynamic 2-bit Prediction Results Mispredictions in SPEC89 with N-entry branch prediction table: N=4096 N=Infinity –Nasa7: 1% 0% –Matrix300: 0% 0% –Tomcatv: 1% 0% –Doduc: 5% 5% –Spice: 9% 9% –FPPPP: 9% 9% –Gcc:12%11% –Espresso: 5% 5% –Eqntott:18%18% –Li:10%10% Still about 90% effective!

17 Dynamic N-bit Prediction Scheme We can try to generalize the 2-bit approach N-bit “saturating counter” predictor –Increment on taken branch –Decrement on untaken branch –Predictions Counter value >= (2^N)/2, predict T Counter value < (2^N)/2, predict NT N-bit “sequence” predictor –X-mispredicts-in-a-row to change –How big is X (relative to N)? Possible? Effectiveness? –Not very… 2-bit predictors good enough!

18 N-bit “Saturating Counter” Predictor T 100 NT 000 NT 001 NT 010 NT 011 T 111 T 110 T 101 NT T T T T T

19 Storing Branch History Where? In instruction memory? –Must write 1 or 2 bits into instruction, not good! Use special branch prediction table memory –Eg, 4096 entries of 2 bits each Not enough for one entry per branch instruction in your program –Or is it? –Which entry goes with which branch? Use lower bits of program counter (hash function) Some branches will use the *same* table entry Is this incorrect? No! –Some branches will be predicted with less accuracy… ie, slower program execution

20 Advanced Branch Prediction 1 Correlating Predictors –Create 8 branch prediction tables Each table may contain ~1024 entries, 2-bits of history each entry Each table is “local history” –3 global bits in CPU form “global history” Simple, small shift register Stores outcome of 3 most recently executed branches (of all branches) –Key idea “global history” determines which branch prediction table to use “local history” works like “2-bit predictor” –Called a (3,2) branch-prediction buffer Regular 2-bit predictor is a (0,2) predictor –Works better than (0,2) predictor

21 Advanced Branch Prediction 2 Multilevel Branch Prediction –Eg, Tournament Predictors: Use 2 different branch predictors per entry Choose the best between them –How to decide which is best? Use a third 2-bit predictor –Like any 2-bit predictor, eg “sequence” –This one says “use predictor 1” or “use predictor 2” Change if current predictor is wrong (but other one was right) twice in a row –Works better than Correlating Predictor

22 Predictors Summary Static –Stall –Always execute (delay slots) –Nullify-if-T (Execute-if-NT) –Nullify-if-NT (Execute-if-T) Dynamic –Nullify-if-mispredicted 2-bit, N-bit “saturating counter” predictor 2-bit “sequence” predictor (N-bit possible?) Correlating predictor –Concept of global / local history Multilevel predictor –Eg, Tournament predictor

EECE476: Computer Architecture Lecture 20: Branch Prediction Chapter 6.6 + extra The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

Similar presentations

Presentation on theme: "EECE476: Computer Architecture Lecture 20: Branch Prediction Chapter 6.6 + extra The University of British ColumbiaEECE 476© 2005 Guy Lemieux."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

EECE476: Computer Architecture Lecture 20: Branch Prediction Chapter 6.6 + extra The University of British ColumbiaEECE 476© 2005 Guy Lemieux.

Similar presentations

Presentation on theme: "EECE476: Computer Architecture Lecture 20: Branch Prediction Chapter 6.6 + extra The University of British ColumbiaEECE 476© 2005 Guy Lemieux."— Presentation transcript:

Similar presentations

About project

Feedback