Presentation is loading. Please wait.

Presentation is loading. Please wait.

Combining Branch Predictors

Similar presentations


Presentation on theme: "Combining Branch Predictors"— Presentation transcript:

1 Combining Branch Predictors
CS Lecture 7 Combining Branch Predictors Scott McFarling WRL Tech. Report TN-36 1993

2 Bimodal Branch Prediction
Identifies most popular prediction in recent past Updates happen during commit 1 PC 10-bit index 1024 entries 2-bit saturating counters

3 Results SPEC’89 programs simulated for 10M instrs
(modern studies use hard-to-predict programs) A larger predictor reduces contention for counters Prediction rates saturate at 93.5% (at 2K bytes) (Fig.3)

4 Local Predictors Two-Level predictor: The first level has history,
the second level has saturating counters History gets updated immediately 1 1 1 PC 1 10-bit index 16 entries 1024 entries 2-bit saturating counters 4-bit history table

5 Results For small predictors, there could be contention
at both levels, resulting in inaccurate predictions Will also take longer to warm up – after every context switch Does very well for large predictors – saturates at 97.1%

6 Global Predictors A single history register – neighboring branches
have correlated results However, the PC is not used 1 1024 entries 10-bit global history 2-bit saturating counters

7 Do We Need PC? Note that the global history reveals which branch
is being examined Hence, it outdoes bimodal predictors when the transistor budget is large (Fig.7) Local predictor does better – it is more important to identify the PC and local history than behavior of neighboring branches

8 Gselect Use a combination of PC and global history
Bimodal and global prediction are special cases (Fig.9) 1 n PC / n+m / / 1024 entries m 5-bit global history 2-bit saturating counters

9 GShare Xor-ing 10 history bits and 10 PC bits has more
info than the concatenation of 5 bits of each and more info than each individual component Branch Address Global History Gselect 4/4 Gshare 8/8

10 Terminology GAG: Global history indexes into global array
of saturating counters PAG: Per-address history indexes into global array GAP: Global history indexes into each PC’s private array of counters (gselect) PAP: Per-address history indexes into each PC’s private array of counters

11 Trade-Offs Some predictors warm-up faster than others
Some programs benefit from global history, some from local history Some programs have branches that interfere with each other Note that a 64KB local predictor has fewer saturating counters than a 64KB bimodal predictor – the former won’t be better for every program

12 Combining Predictors Use an array of saturating counters to pick the
best available predictor for each PC Predictor A 1 PC 1024 entries Predictor B 2-bit saturating counters

13 Results The combination of local and gshare increases
the prediction accuracy to 98.1% (Fig.16) For smaller transistor budgets, the combination of bimodal and gshare is better (gshare is twice the size to make sure the total is a power of two) A 1KB combined predictor does as well as a 16KB gselect predictor

14 Future Work Detect conflicts, correlations, and common
predictions through profiling/compiler analysis Functions that compress information in history or PC Pipeline predictions – predict two branches ahead Hierarchical predictors – get a quick prediction in a cycle and a more accurate one two cycles later

15 Next Week’s Paper “Design Trade-Offs for the Alpha EV8 Conditional
Branch Predictor”, Seznec et al., ISCA’02

16 Title Bullet


Download ppt "Combining Branch Predictors"

Similar presentations


Ads by Google