Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 A 64 Kbytes ITTAGE indirect branch predictor André Seznec INRIA/IRISA.

Similar presentations


Presentation on theme: "1 A 64 Kbytes ITTAGE indirect branch predictor André Seznec INRIA/IRISA."— Presentation transcript:

1 1 A 64 Kbytes ITTAGE indirect branch predictor André Seznec INRIA/IRISA

2 2 Build on ITTAGE ITTAGE: Introduced at the same time as TAGE (JILP 2006) Derived directly from the TAGE predictor: Target prediction instead of direction prediction

3 3 ITTAGE: multiple tables, global history predictor The set of history lengths forms a geometric series What is important: L(i)-L(i-1) is drastically increasing most of the storage for short history !! {0, 2, 4, 8, 16, 32, 64, 128} Capture correlation on very long histories

4 4 pc h[0:L1] =? prediction pc h[0:L2 ] pc h[0:L3 ] 32 1 1 1 Tagless base Predictor The ITTAGE predictor

5 5 Prediction computation General case: Longest matching component provides the prediction Special case: Many mispredictions on newly allocated entries: weak Ctr Sometimes Altpred (slightly) more accurate than Pred Property dynamically monitored through a single 4-bit counter -2 % MPPKI

6 6 A tagged table entry Ctr: 2-bit hysteresis counter U: 1-bit useful counter Was the entry recently useful ? Tag: partial tag Target: the target TargetTagCtrU 32 bits or some way to reconstruct it

7 7 Allocate entries on mispredictions Allocate entries in longer history length tables On tables with U unset Set Ctr to Weak and U to 0 HUGE STORAGE BUDGET: Up to 3 entries allocated in different tables  Fast warming

8 8 Managing the (U)seful bit Setting when avoids a misprediction  (Pred = target) & (Alt ≠ target) Global reset when « difficulties » to allocate Dynamically monitor if more failures than successes on allocations

9 9 Most of the storage space for targets 32 bits per entry !! More than 12K (PC,target) pairs on CLIENT05 But only a maximum of 4038 different targets Use 12 bit pointers + a 4K table

10 10 Let us be realistic: leverage target locality All targets in at most 90 256KB regions Use a 128-entry region table: Fully associative, 240 bytes Saves 7 bits per ITTAGE entry Would have saved 39 bits on a 64-bit architecture !!

11 11 TargetTagCtrU Region offsetRegion pointer

12 12 The global history -16 % MPPKI

13 13 The global history (2) Including all branches ? Only indirect and calls: -2.5 % MPPKI But no conclusion: without 2 branches on INT05 and INT06 just the other way

14 14 + the other tricks (for TAGE) Immediate Update Mimicker Storage space interleaving Picking the best set of history lengths -1 % MPPKI

15 15 The Immediate Update Mimicker Issue: Some mispredictions due to late updates at retirement Immediate Update Mimicker: Try to catch these cases

16 16 PTAPTA Same table, same entry ETAETA PTAPTA PTAPTA PTAPTA PTAPTA PTAPTA PTAPTA ETAETA PTAPTA PTAPTA PTAPTA PTAPTA PTAPTA Misprediction P(rediction) T(able) A(ddress in the table) PTAPTA PTAPTA PTAPTA PTAPTA PTAPTA PTAPTA PTAPTA PTAPTA PTAPTA PTAPTA PTAPTA PTAPTA Fetch The Immediate Update Mimicker

17 17 =? prediction Xbar h[0,L 1] For the competition: interleaving

18 18 For the competition Guided selection of the best set of history lengths: 4Kentries: 0, 4Kentries: 0, 10, 4Kentries: 16, 27, 44, 60, 96, 109, 219, 449, 2Kentries: 487, 714, 1313, 2146, 3881 Remember: 10 bits per indirect, 5 per call

19 19 Where is the limit ? Less than 3 % MPPKI Why did you not use the « 12-bit pointer » trick ? Just winning 0.5 % MPPKI

20 20 Summary ITTAGE directly derived from TAGE History should include (PC+target) for indirect and calls Locality on targets can be leveraged Marginal tricks not really worth


Download ppt "1 A 64 Kbytes ITTAGE indirect branch predictor André Seznec INRIA/IRISA."

Similar presentations


Ads by Google