1 A 64 Kbytes ITTAGE indirect branch predictor André Seznec INRIA/IRISA.

Slides:



Advertisements
Similar presentations
Bimode Cascading: Adaptive Rehashing for ITTAGE Indirect Branch Predictor Y.Ishii, K.Kuroyanagi, T.Sawada, M.Inaba, and K.Hiraki.
Advertisements

André Seznec Caps Team IRISA/INRIA 1 Looking for limits in branch prediction with the GTL predictor André Seznec IRISA/INRIA/HIPEAC.
H-Pattern: A Hybrid Pattern Based Dynamic Branch Predictor with Performance Based Adaptation Samir Otiv Second Year Undergraduate Kaushik Garikipati Second.
Pipelining V Topics Branch prediction State machine design Systems I.
Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture.
Exploring Correlation for Indirect Branch Prediction 1 Nikunj Bhansali, Chintan Panirwala, Huiyang Zhou Department of Electrical and Computer Engineering.
André Seznec Caps Team IRISA/INRIA 1 The O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.
Dynamic Branch Prediction
Yue Hu David M. Koppelman Lu Peng A Penalty-Sensitive Branch Predictor Department of Electrical and Computer Engineering Louisiana State University.
A PPM-like, tag-based predictor Pierre Michaud. 2 Main characteristics global history based 5 tables –one 4k-entry bimodal (indexed with PC) –four 1k-entry.
TAGE-SC-L Branch Predictors
CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
1 Lecture: Branch Prediction Topics: branch prediction, bimodal/global/local/tournament predictors, branch target buffer (Section 3.3, notes on class webpage)
W04S1 COMP s1 Seminar 4: Branch Prediction Slides due to David A. Patterson, 2001.
Computer Architecture 2011 – Branch Prediction 1 Computer Architecture Advanced Branch Prediction Lihu Rappoport and Adi Yoaz.
EECS 470 Branch Prediction Lecture 6 Coverage: Chapter 3.
1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )
Combining Branch Predictors
EECC722 - Shaaban #1 Lec # 10 Fall Conventional & Block-based Trace Caches In high performance superscalar processors the instruction fetch.
Trace Caches J. Nelson Amaral. Difficulties to Instruction Fetching Where to fetch the next instruction from? – Use branch prediction Sometimes there.
Branch Target Buffers BPB: Tag + Prediction
Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: static speculation and branch prediction (Sections )
Dynamic Branch Prediction
CIS 429/529 Winter 2007 Branch Prediction.1 Branch Prediction, Multiple Issue.
EECC722 - Shaaban #1 Lec # 9 Fall Conventional & Block-based Trace Caches In high performance superscalar processors the instruction fetch.
CS 7810 Lecture 6 The Impact of Delay on the Design of Branch Predictors D.A. Jimenez, S.W. Keckler, C. Lin Proceedings of MICRO
1 Storage Free Confidence Estimator for the TAGE predictor André Seznec IRISA/INRIA.
Analysis of Branch Predictors
1 Two research studies related to branch prediction and instruction sequencing André Seznec INRIA/IRISA.
André Seznec Caps Team IRISA/INRIA 1 Analysis of the O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.
1 A New Case for the TAGE Predictor André Seznec INRIA/IRISA.
1 Revisiting the perceptron predictor André Seznec IRISA/ INRIA.
CSCI 6461: Computer Architecture Branch Prediction Instructor: M. Lancaster Corresponding to Hennessey and Patterson Fifth Edition Section 3.3 and Part.
Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques.
Trace Substitution Hans Vandierendonck, Hans Logie, Koen De Bosschere Ghent University EuroPar 2003, Klagenfurt.
T-BAG: Bootstrap Aggregating the TAGE Predictor Ibrahim Burak Karsli, Resit Sendag University of Rhode Island.
Computer Structure Advanced Branch Prediction
André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC.
CS 6290 Branch Prediction. Control Dependencies Branches are very frequent –Approx. 20% of all instructions Can not wait until we know where it goes –Long.
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)
1 The Inner Most Loop Iteration counter a new dimension in branch history André Seznec, Joshua San Miguel, Jorge Albericio.
1 Lecture: Out-of-order Processors Topics: branch predictor wrap-up, a basic out-of-order processor with issue queue, register renaming, and reorder buffer.
André Seznec Caps Team IRISA/INRIA 1 Analysis of the O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.
JILP RESULTS 1. JILP Experimental Framework Goal Simplicity of a trace based simulator Flexibility to model special predictors ( e.g., using data values)
Dynamic Branch Prediction
Lecture: Out-of-order Processors
CS203 – Advanced Computer Architecture
Computer Structure Advanced Branch Prediction
Computer Architecture Advanced Branch Prediction
COSC3330 Computer Architecture Lecture 15. Branch Prediction
CS 704 Advanced Computer Architecture
FA-TAGE Frequency Aware TAgged GEometric History Length Branch Predictor Boyu Zhang, Christopher Bodden, Dillon Skeehan ECE/CS 752 Advanced Computer Architecture.
CMSC 611: Advanced Computer Architecture
Exploring Value Prediction with the EVES predictor
Looking for limits in branch prediction with the GTL predictor
Sponsored by JILP and Intel’s Academic Research Office
Lecture: Branch Prediction
Scaled Neural Indirect Predictor
Dynamic Branch Prediction
Lecture 10: Branch Prediction and Instruction Delivery
File Systems and Disk Management
TAGE-SC-L Again MTAGE-SC
5th JILP Workshop on Computer Architecture Competitions
Adapted from the slides of Prof
Dynamic Hardware Prediction
The O-GEHL branch predictor
Lecture 7: Branch Prediction, Dynamic ILP
Presentation transcript:

1 A 64 Kbytes ITTAGE indirect branch predictor André Seznec INRIA/IRISA

2 Build on ITTAGE ITTAGE: Introduced at the same time as TAGE (JILP 2006) Derived directly from the TAGE predictor: Target prediction instead of direction prediction

3 ITTAGE: multiple tables, global history predictor The set of history lengths forms a geometric series What is important: L(i)-L(i-1) is drastically increasing most of the storage for short history !! {0, 2, 4, 8, 16, 32, 64, 128} Capture correlation on very long histories

4 pc h[0:L1] =? prediction pc h[0:L2 ] pc h[0:L3 ] Tagless base Predictor The ITTAGE predictor

5 Prediction computation General case: Longest matching component provides the prediction Special case: Many mispredictions on newly allocated entries: weak Ctr Sometimes Altpred (slightly) more accurate than Pred Property dynamically monitored through a single 4-bit counter -2 % MPPKI

6 A tagged table entry Ctr: 2-bit hysteresis counter U: 1-bit useful counter Was the entry recently useful ? Tag: partial tag Target: the target TargetTagCtrU 32 bits or some way to reconstruct it

7 Allocate entries on mispredictions Allocate entries in longer history length tables On tables with U unset Set Ctr to Weak and U to 0 HUGE STORAGE BUDGET: Up to 3 entries allocated in different tables  Fast warming

8 Managing the (U)seful bit Setting when avoids a misprediction  (Pred = target) & (Alt ≠ target) Global reset when « difficulties » to allocate Dynamically monitor if more failures than successes on allocations

9 Most of the storage space for targets 32 bits per entry !! More than 12K (PC,target) pairs on CLIENT05 But only a maximum of 4038 different targets Use 12 bit pointers + a 4K table

10 Let us be realistic: leverage target locality All targets in at most KB regions Use a 128-entry region table: Fully associative, 240 bytes Saves 7 bits per ITTAGE entry Would have saved 39 bits on a 64-bit architecture !!

11 TargetTagCtrU Region offsetRegion pointer

12 The global history -16 % MPPKI

13 The global history (2) Including all branches ? Only indirect and calls: -2.5 % MPPKI But no conclusion: without 2 branches on INT05 and INT06 just the other way

14 + the other tricks (for TAGE) Immediate Update Mimicker Storage space interleaving Picking the best set of history lengths -1 % MPPKI

15 The Immediate Update Mimicker Issue: Some mispredictions due to late updates at retirement Immediate Update Mimicker: Try to catch these cases

16 PTAPTA Same table, same entry ETAETA PTAPTA PTAPTA PTAPTA PTAPTA PTAPTA PTAPTA ETAETA PTAPTA PTAPTA PTAPTA PTAPTA PTAPTA Misprediction P(rediction) T(able) A(ddress in the table) PTAPTA PTAPTA PTAPTA PTAPTA PTAPTA PTAPTA PTAPTA PTAPTA PTAPTA PTAPTA PTAPTA PTAPTA Fetch The Immediate Update Mimicker

17 =? prediction Xbar h[0,L 1] For the competition: interleaving

18 For the competition Guided selection of the best set of history lengths: 4Kentries: 0, 4Kentries: 0, 10, 4Kentries: 16, 27, 44, 60, 96, 109, 219, 449, 2Kentries: 487, 714, 1313, 2146, 3881 Remember: 10 bits per indirect, 5 per call

19 Where is the limit ? Less than 3 % MPPKI Why did you not use the « 12-bit pointer » trick ? Just winning 0.5 % MPPKI

20 Summary ITTAGE directly derived from TAGE History should include (PC+target) for indirect and calls Locality on targets can be leveraged Marginal tricks not really worth