André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC.

André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC

André Seznec Caps Team Irisa 2 Directly derived from : A case for (partially) tagged branch predictors, A. Seznec and P. Michaud JILP Feb. 2006 + Tricks: Loop predictor Kernel/user histories

André Seznec Caps Team Irisa 3 TAGE: TAgged GEometric history length predictors The genesis

André Seznec Caps Team Irisa 4 Back around 2003  2bcgskew was state-of-the-art, but:  but was lagging behind neural inspired predictors on a few benchmarks  Just wanted to get best of both behaviors and maintain:  Reasonable implementation cost: Use only global history Medium number of tables  In-time response

André Seznec Caps Team Irisa 5 L(0) ? L(4) L(3) L(2) L(1) TO T1 T2 T3 T4 The basis : A Multiple length global history predictor

André Seznec Caps Team Irisa 6 GEometric History Length predictor The set of history lengths forms a geometric series What is important: L(i)-L(i-1) is drastically increasing most of the storage for short history !! {0, 2, 4, 8, 16, 32, 64, 128} Capture correlation on very long histories

André Seznec Caps Team Irisa 7 Combining multiple predictions ?  Classical solution:  Use of a meta predictor “wasting” storage !?! chosing among 5 or 10 predictions ??  Neural inspired predictors, Jimenez and Lin 2001  Use an adder tree instead of a meta-predictor  Partial matching  Use tagged tables and the longest matching history Chen et al 96, Michaud 2005

André Seznec Caps Team Irisa 8 L(0) ∑ L(4) L(3) L(2) L(1) TO T1 T2 T3 T4 CBP-1 (2004): OGEHL Final computation through a sum Prediction=Sign 12 components 3.670 misp/KI

André Seznec Caps Team Irisa 9 pc h[0:L1] ctr u tag hash =? ctr u tag hash =? ctr u tag hash =? prediction pc h[0:L2]pch[0:L3] 1 11 1111 1 1 TAGE Geometric history length + PPM-like + optimized update policy Tagless base predictor

André Seznec Caps Team Irisa 10

André Seznec Caps Team Irisa 11 Prediction computation  General case:  Longest matching component provides the prediction  Special case:  Many mispredictions on newly allocated entries: weak Ctr On many applications, Altpred more accurate than Pred  Property dynamically monitored through a single 4-bit counter

André Seznec Caps Team Irisa 12 TAGE update policy  General principle: Minimize the footprint of the prediction.  Just update the longest history matching component and allocate at most one entry on mispredictions

André Seznec Caps Team Irisa 13 A tagged table entry  Ctr: 3-bit prediction counter  U: 2-bit useful counter  Was the entry recently useful ?  Tag: partial tag TagCtr U

André Seznec Caps Team Irisa 14 Updating the U counter If (Altpred ≠ Pred) then Pred = taken : U= U + 1 Pred ≠ taken : U = U - 1 Graceful aging: Periodic shift of all U counters implemented through the reset of a single bit

André Seznec Caps Team Irisa 15 Allocating a new entry on a misprediction  Find a single “useless” entry with a longer history:  Priviledge the smallest possible history To minimize footprint  But not too much To avoid ping-pong phenomena  Initialize Ctr as weak and U as zero

André Seznec Caps Team Irisa 16 Improve the global history  Address + conditional branch history:  path confusion on short histories   Address + path:  Direct hashing leads to path confusion  1.Represent all branches in branch history 2.Use also path history ( 1 bit per branch, limited to 16 bits)

André Seznec Caps Team Irisa 17 Design tradeoff for CBP2 (1)  13 components:  Bring the best accuracy on distributed traces 8 components not very far !  History length:  Min=4, Max = 640 Could use any Min in [2,6] and any Max in [300, 2000]

André Seznec Caps Team Irisa 18 Design tradeoff for CBP2 (2)  Tag width tradeoff:  (destructive) false match is better tolerated on shorter history  7 bits on T1 to 15 bits on T12  Tuning the number of table entries:  Smaller number for very long histories  Smaller number for very short histories

André Seznec Caps Team Irisa 19 Adding a loop predictor  The loop predictor captures the number of iterations of a loop  When successively encounters 4 times the same number of iterations, the loop predictor provides the prediction.  Advantages:  Very reliable  Small storage budget: 256 52-bit entries  Complexity ?  Might be difficult to manage speculative iteration numbers on deep pipelines

André Seznec Caps Team Irisa 20 Using a kernel history and a user history  Traces mix user and kernel activities:  Kernel activity after exception Global history pollution  Solution: use two separate global histories  User history is updated only in user mode  Kernel history is updated in both modes

André Seznec Caps Team Irisa 21 L-TAGE submission accuracy (distributed traces) 3.314 misp/KI

André Seznec Caps Team Irisa 22 Reducing L-TAGE complexity  Included 241,5 Kbits TAGE predictor:  3.368 misp/KI  Loop predictor beneficial only on gzip: Might not be worth the extra complexity

André Seznec Caps Team Irisa 23 Using less tables  8 components 256 Kbits TAGE predictor:  3.446 misp/KI

André Seznec Caps Team Irisa 24 TAGE prediction computation time ?  3 successive steps:  Index computation  Table read  Partial match + multiplexor  Does not fit on a single cycle:  But can be ahead pipelined !

André Seznec Caps Team Irisa 25 Ahead pipelining a global history branch predictor (principle)  Initiate branch prediction X+1 cycles in advance to provide the prediction in time  Use information available: X-block ahead instruction address X-block ahead history  To ensure accuracy:  Use intermediate path information

André Seznec Caps Team Irisa 26 Practice Ahead pipelined TAGE: 4// prediction computations bc Ha A A B C

André Seznec Caps Team Irisa 27 3-branch ahead pipelined 8 component 256 Kbits TAGE 3.552 misp/KI

André Seznec Caps Team Irisa 28 A final case for the Geometric History Length predictors  delivers state-of-the-art accuracy  uses only global information:  Very long history: 300+ bits !!  can be ahead pipelined  many effective design points  OGEHL or TAGE  Nb of tables, history lengths

André Seznec Caps Team Irisa 29 The End

André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC.

Similar presentations

Presentation on theme: "André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC.

Similar presentations

Presentation on theme: "André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC."— Presentation transcript:

Similar presentations

About project

Feedback