Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computer Structure Advanced Branch Prediction

Similar presentations


Presentation on theme: "Computer Structure Advanced Branch Prediction"— Presentation transcript:

1 Computer Structure Advanced Branch Prediction
Lihu Rappoport and Adi Yoaz

2 Introduction Need to predict:
Conditional branch direction (taken or no taken) Actual direction is known only after execution Wrong direction prediction causes a full flush All taken branch (conditional taken or unconditional) targets Target of direct branches known at decode Target of indirect branches known at execution Branch type Conditional, uncond. direct, uncond. indirect, call, return Target: minimize branch misprediction rate for a given predictor size

3 What/Who/When We Predict/Fix
Fetch Decode Execute Target Array Branch type conditional unconditional direct unconditional indirect call return Branch target Fix TA miss Fix wrong direct target Cond. Branch Predictor Predict conditional T/NT on TA miss try to fix Fix Wrong prediction Return Stack Buffer Predict return target Fix TA miss Fix Wrong prediction Indirect Target Array Predict indirect target override TA target Fix Wrong prediction Dec Flush Exe Flush

4 Branches and Performance
MPI : misprediction-per-instruction: # of incorrectly predicted branches MPI = total # of instructions MPI correlates well with performance. For example: MPI = 1% (1 out of 100 out of 20 branches) IPC=2 (IPC is the average number of instructions per cycle), flush penalty of 10 cycles We get: MPI = 1%  flush in every 100 instructions flush in every 50 cycles (since IPC=2), 10 cycles flush penalty every 50 cycles 20% in performance

5 Target Array TA is accessed using the branch address (branch IP)
Implemented as an n-way set associative cache Tags usually partial Save space Can get false hits Few branches aliased to the same entry No correctness only performance TA predicts the following Indication that instruction is a branch Predicted target Branch type Unconditional: take target Conditional: predict direction TA allocated / updated at execution Branch IP tag target predicted hit / miss (indicates a branch) type

6 Conditional Branch Direction Prediction

7 One-Bit Predictor branch IP Prediction (at fetch):
previous branch outcome counter array / cache Update (at execution) Update bit with branch outcome Problem: 1-bit predictor has a double mistake in loops Branch Outcome Prediction ?

8 Bimodal (2-bit) Predictor
A 2-bit counter avoids the double mistake in glitches Need “more evidence” to change prediction 2 bits encode one of 4 states 00 – strong NT, 01 – weakly NT, 10 – weakly taken, 11 – strong taken Initial state: weakly-taken (most branches are taken) Update Branch was actually taken: increment counter (saturate at 11) Branch was actually not-taken: decrement counter (saturate at 00) Predict according to m.s.bit of counter (0 – NT, 1 – taken) Does not predict well branches with patterns like … 00 SNT taken not-taken not- 01 WNT 10 WT 11 ST Predict taken Predict not-taken

9 Bimodal Predictor (cont.)
Prediction = msb of counter 2-bit-sat counter array Update counter with branch outcome l.s. bits of branch IP

10 Bimodal Predictor - example
Br1 prediction Pattern: counter: Prediction: Br2 prediction Pattern: counter: Prediction: Br3 prediction Pattern: counter: Code: int n = 6 Loop: …. br1: if (n/2) { … } br2: if ((n+1)/2) { … } n--; br3: JNZ n, Loop

11 2-Level Prediction: Local Predictor
Save the history of each branch in a Branch History Register (BHR): A shift-register updated by branch outcome Saves the last n outcomes of the branch Used as a pointer to an array of bits specifying direction per history Example: assume n=6 Assume the pattern At the steady-state, the following patterns are repeated in the BHR: 000100 001000 010001 100010 Following , , the jump is not taken Following the jump is taken BHR 2n-1 n

12 Local Predictor (cont.)
There could be glitches from the pattern Use 2-bit saturating counters instead of 1 bit to record outcome: Too long BHRs are not good: Past history may be no longer relevant Warm-Up is longer Counter array becomes too big Update History with branch outcome prediction = msb of counter 2-bit-sat counter array Update counter with branch outcome history BHR

13 Local Predictor: private counter arrays
Holding BHRs and counter arrays for many branches: Branch IP tag history prediction = msb of counter 2-bit-sat counter arrays Update counter with branch outcome Update History with branch outcome history cache Predictor size: #BHRs × (tag_size + history_size + 2 × 2 history_size) Example: #BHRs = 1024; tag_size=8; history_size=6  size=1024 × ( ×26) = 142Kbit

14 Local Predictor: shared counter arrays
Using a single counter array shared by all BHR’s All BHR’s index the same array Branches with similar patterns interfere with each other prediction = msb of counter Branch IP 2-bit-sat counter array tag history history cache Predictor size: #BHRs × (tag_size + history_size) + 2 × 2 history_size Example: #BHRs = 1024; tag_size=8; history_size=6  size=1024 × (8 + 6) + 2×26 = 14.1Kbit

15 Local Predictor: lselect
lselect reduces inter-branch-interference in the counter array prediction = msb of counter Branch IP 2-bit-sat counter array tag history history cache h h+m m l.s.bits of IP Predictor size: #BHRs × (tag_size + history_size) + 2 × 2 history_size + m

16 Local Predictor: lshare
lshare reduces inter-branch-interference in the counter array: maps common patterns in different branches to different counters h h l.s.bits of IP history cache tag history prediction = msb of counter Branch IP 2-bit-sat counter array Predictor size: #BHRs × (tag_size + history_size) + 2 × 2 history_size

17 Global Predictor The behavior of some branches is highly correlated with the behavior of other branches: if (x < 1) . . . if (x > 1) . . . Using a Global History Register (GHR), the prediction of the second if may be based on the direction of the first if For other branches the history interference might be destructive

18 Global Predictor (cont.)
Update History with branch outcome prediction = msb of counter 2-bit-sat counter array Update counter with branch outcome history GHR The predictor size: history_size + 2*2 history_size Example: history_size = 12  size = 8 K Bits

19 Global Predictor: Gshare
gshare combines the global history information with the branch IP prediction = msb of counter 2-bit-sat counter array Update counter with branch outcome Branch IP history GHR Update History with branch outcome

20 Chooser A chooser selects between 2 predictor that predict the same branch: Use the predictor that was more correct in the past +1 if Bimodal / Local correct and Global wrong -1 if Bimodal / Local wrong and Global correct Bimodal / Local Global Branch IP Prediction Chooser array (an array of 2-bit sat. counters) GHR The chooser may also be indexed by the GHR

21 Speculative History Updates
Deep pipeline  many cycles between fetch and branch resolution If history is updated only at resolution Local: future occurrences of the same branch may see stale history Global: future occurrences of all branches may see stale history History is speculatively updated according to the prediction History must be corrected if the branch is mispredicted Speculative updates are done in a special field to enable recovery Speculative History Update Speculative history updated assuming previous predictions are correct Speculation bit set to indicate that speculative history is used Counter array is not updated speculatively Prediction can change only on a misprediction (state 01→10 or 10→01) On branch resolution Update the real history and reset speculative histories if mispredicted

22 Return Stack Buffer A return instruction is a special case of an indirect branch: Each times it jumps to a different target The target is determined by the location of the corresponding call instruction The idea: Hold a small stack of targets When the target array predicts a call Push the address of the instruction which follows the call into the stack When the target array predicts a return Pop a target from the stack and use it as the return address

23 Intel 486, Pentium®, Pentium® II
486 Statically predict Not Taken Pentium® 2-bit saturating counters Pentium® II 2-level, local histories, per-set counters 4-way set associative: 512 entries in 128 sets IP Tag Hist 1001 Pred= msb of counter 9 15 Way 0 Way 1 Target Way 2 Way 3 4 32 counters 128 sets P T V 2 1 LRR Per-Set Branch Type 00- cond 01- ret 10- call 11- uncond

24 Alpha 264 – LG Chooser Local New entry on the Local stage is allocated on a global stage miss-prediction Chooser state-machines: 2 bit each: one bit saves last time global correct/wrong, and the other bit saves for the local correct/wrong Chooses Local only if local was correct and global was wrong Histories Counters 256 In each entry: 6 bit tag + 10 bit History 1024 IP 3 4 ways Global Counters 4096 GHR 2 12 Chooser Counters 4096 2

25 Pentium® M Combines 3 predictors Bimodal, Global and a Loop predictor
Loop predictor detects branches that have loop behavior Moving in one direction (taken or NT) a fixed number of times Ended with a single movement in the opposite direction

26 Pentium® M – Indirect Branch Predictor
Indirect branch targets: target given in a register Can have many targets: e.g., a case statement Resolved at execution  high misprediction penalty Used in object-oriented code (C++, Java) Added dedicated indirect branch target predictor global history based Indirect Target Array Branch IP history GHR

27 Indirect Branch Target Prediction
Initially allocate indirect branch only in the Target Array (TA) Many indirect branches have only one target If the TA mispredicts for an indirect jump Allocate an iTA entry with the current target Indexed by IP  Global History Prediction from the iTA is used if TA indicates an indirect branch and iTA hits Target Array Indirect Target Predictor Branch IP Predicted M X U hit indirect branch HIT Global history


Download ppt "Computer Structure Advanced Branch Prediction"

Similar presentations


Ads by Google