TAGE-SC-L Branch Predictors

1 TAGE-SC-L Branch Predictors
André Seznec INRIA/IRISA

2 The TAGE-SC-L branch predictor Sorry, nothing really new ..
TAGE, JILP 2006 Considered as state-of-the-art global history predictor Can be augmented with small adjunct predictors Loop predictor: CBP-2 (2006) Statistical Corrector + Loop Predictor, Global history CBP-3 (2011) Local history Micro 2011

3 Optimized all parameters
Number, size, width of the tables Types of the histories for the statistical components All that for decreasing the misprediction number by 3% !!

4 Global, local, skeleton histories
(Main) TAGE Predictor Stat. Cor. Prediction + Confidence Loop Predictor PPC +Global history Global, local, skeleton histories

5 TAGE: multiple tables, global history predictor
The set of history lengths forms a geometric series Capture correlation on very long histories {0, 2, 4, 8, 16, 32, 64, 128} most of the storage for short history !!

6 TAGE: Tagged and prediction by the longest history matching entry
pc h[0:L1] ctr u tag =? prediction h[0:L2] h[0:L3] 1 Tagless base predictor

7 =? 1 Hit Altpred Pred Miss

8 Prediction computation
General case: Longest matching component provides the prediction Special case: Many mispredictions on newly allocated entries: weak Ctr On many applications, Altpred more accurate than Pred Property dynamically monitored through 4-bit counters

9 A tagged table entry Ctr: 3-bit prediction counter U: 2-bit counters
Was the entry recently useful ? Tag: partial tag Tag Ctr U

10 Allocate entries on mispredictions
Allocate entries in longer history length tables On tables with U unset Set Ctr to Weak and U to 0 Limited storage budget: Allocate 2 entries for 256Kbits Allocate 1 or 2 for 32Kbits UNLIMITED STORAGE BUDGET: multiple entries allocated in different tables

11 Managing the (U)seful counter
Increment when avoids a misprediction (Pred = taken) & (Alt ≠ taken) 256K: Global decrement if « difficult » to allocate 32K: Probabilistic decrement when conflict Unlimited: don’t care

12 Adjunct predictors TAGE tracks strong correlation with the global branch history Small adjunct predictors to capture some missed correlation: Loop predictor Statistical Corrector

13 The loop predictor Predict loop with constant number of iterations:
16/32 entries less than 5 bytes per entry Capture loops with long bodies and/or irregular internal branches S: 1.2 %  M: 1 %  U:0.4%  Good tradeoff for the Championship Implementation: Not that great

14 The Statistical Corrector predictor
Branches with poor correlation with global history: Sometimes better predicted by a single wide PC indexed counter than by TAGE More generally, track cases such that: « In this case (PC, history, prediction), TAGE is likely (>50 %) to mispredict »

15 Small predictor: very limited budget for the SC predictor
Just track the statistically PC biased branches « TAGE predicts this direction on this branch, but in most cases this was wrong » The corrector filter: A small partially tagged associative table 1.5 % misp. reduction: Much simpler than a loop predictor

16 Medium predictor « Statistically » correlated branches:
Not strongly correlated with the global history, but exhibit a bias better predicted by averaging than tags neural  tags Branches correlated with local history, but irregular global history pattern (on other branches) TAGE does not learn the pattern

17 MultiGehl Statistical Correlator Predictor
+ H + LH PC Pred Gehl-like Prediction + ctr value TAGE Stat. Corr. H PC Local hist.

18 Why does it work The bias table indexed with PC+TAGE output:
Correct (most of the time) High counter value Dominates, not many updates Wrong Other counters can be trained Correlation (if it exists) can be captured

19 MultiGehl Statistical Correlator Predictor for the Championship
+ RAS associated history + 2 different local histories + simple choser 6.8 % misp reduction TAGE H PC Stat. Corr. Prediction + ctr value Local hist.

20 « Realistic » 256 Kbits TAGE-SC-L
« Only » 12 equal size TAGE tables + (local hist., global hist.) 4-tables SC + loop predictor No history tuning Only 2.8 % extra mispredictions

21 SC for Unlimited predictor
GEHL based SC predictor: Use any form of history information Very long global Mutiple local « Skeleton » global history ignore some branches Recycle old ideas from the MAC-RHSP predictor (2004)

22 SC for unlimited predictor
460 predictor tables + 10 choser tables Globally about 20 % less misp. than TAGE alone If one removes only : The bias: 1.6 % for a single table All global history components: 3.7 % All local history components: 3.9 % The choser: 3.2 %

23 Conclusion TAGE-SC-L fits (nearly) all storage sizes
32Kbits ≈ 64Kbits CBP1 champion on CBP1 traces 256Kbits ≈ 512Kbits CBP3 champion on CBP4 traces Unlimited predictor: poTAGE-SC does better

