Presentation is loading. Please wait.

Presentation is loading. Please wait.

The O-GEHL branch predictor

Similar presentations


Presentation on theme: "The O-GEHL branch predictor"— Presentation transcript:

1 The O-GEHL branch predictor
Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC

2 Branch prediction is still an issue
Long pipelines Multiple instruction per cycle Any gain in branch prediction accuracy results in: Performance gain Power consumption gain

3 What must be predicted:
Directions and targets: Indirect branch target Conditional branch direction This presentation: Only conditional branch direction

4 Conditional branch predictors
Two main sources of information: Local history the past behavior of this particular branch Global history the (recent) past behavior of all branches

5 Speculative history must be managed !?
Global history: Append a bit on a single history register Use of a circular buffer and just a pointer to speculatively manage the history Local history: table of histories (unspeculatively updated) must maintain a speculative history per inflight branch: Associative search, etc ?!?

6 From my experience with EV8 team
Designers hate maintaining speculative local history

7 “Classic” stuff with the OGEHL predictor 
Global history based: Yeh and Patt 91, Pan and So 91 Multiple history lengths to index multiple tables McFarling 93, Evers et al. 96, EV8 predictor

8 Selecting between multiple predictions
Classic solution: Use of a meta predictor “wasting” storage !?! chosing among 5 or 10 predictions ?? Neural inspired predictors: Use an adder tree instead of a meta-predictor Vintan and Iridon 99, Jimènez and Lin 01 Let’s use the adder tree

9 Perceptron predictor One weight per history bit X Sign=prediction

10 What makes such predictor working ?
Use of signed saturated counters: 8-bit on perceptron predictors Partial update policy: Update all the counters on misprediction Update all the counters when sum absolute value is under a threshold For perceptron predictors: Θ= 2.14 N

11 Perceptron predictors: strong points and shortcomings
Effective solution for selecting the prediction Hardware cost scales linearly with the history length ! Shortcomings: High number of adders Long latency Good accuracy, but no better than conventional hybrid predictors Hardware cost scales linearly with the history length ?

12 Multiple history length predictor
L(4) L(3) L(2) L(1) L(0) TO T1 T2 T3 T4 Multiple history length predictor Prediction=Sign

13 GEometric History Length predictor
The set of history lengths forms a geometric series {0, 2, 4, 8, 16, 32, 64, 128} What is important: L(i)-L(i-1) is drastically increasing Spends most of the storage for short history !!

14 Counter width on O-GEHL predictors
8 bits are just overkilling  4 bits are sufficient  Mixing 5 bits for short histories and 4 bits for long histories is slightly better  3 bits are not so bad !!

15 Dynamic update threshold fitting
Reasonable fixed threshold= Number of tables On an O-GEHL predictor, best threshold depends on the application  the predictor size  the counter width  By chance, on most applications, for the best fixed threshold, updates on mispredictions ≈ updates on correct predictions Monitor the difference and adapt the update threshold

16 Adaptative history length fitting (inspired by Juan et al 98)
(½ applications: L(7) < 50) (½ applications: L(7) > 150) Let us adapt some history lengths to the behavior of each application 8 tables: T2: L(2) and L(8) T4: L(4) and L(9) T6: L(6) and L(10)

17 Adaptative history length fitting (2)
Intuition: if high degree of conflicts on T7, stick with short history Implementation: monitoring of aliasing on updates on T7 through a tag bit and a counter Simple is sufficient: Flipping from short to long histories and vice-versa

18 TO T1 T2 L(0) T3 L(1) or L(5) L(2) T4 L(3) or L(6) L(4) Tag bits

19 What is global history ? Address + conditional branch history:
path confusion on short histories  Address + path: Direct hashing leads to path confusion  Represent all branches in branch history Use also path history ( 1 bit per branch, limited to 16 bits)

20 Hashing 200+ bits for indexing !!
Need to compute 11 bits indexes : Full hashing is unrealistic Just regularly pick at most 33 bits in: address+branch history +path history A single 3-entry exclusive-OR stage

21 Evaluation framework 1st Championship Branch Prediction traces:
20 traces including system activity Floating point apps : loop dominated Integer apps Multimedia apps Server workload apps: very large footprint

22 CBP A single rule: The storage budget must fit 64K + 256 bits.
Rule artefacts Don’t care about implementability Initialize the predictor with what is convenient Adapt the configuration to each benchmark, but don’t care about transition between applications

23 Configuration presented for Championship Branch Prediction
8 tables: 2 Kentries except T1, 1Kentries 5 bit counters for T0 and T1, 4 bit counters otherwise 1 Kbits of one bit tags associated with T7 10K + 5K + 6x8K + 1K = 64K L(1) =3 and L(10)= 200 {0,3,5,8,12,19,31,49,75,125,200}

24 A case for the OGEHL predictor
2nd at CBP: 2.82 misp/KI Best practice award: The predictor the closest to a possible hardware implementation Chaining simulations: 2.84 misp/KI The winner: 4.34 misp/KI (but respecting the rule is respecting the rule )

25 A case for the OGEHL predictor (2)
High accuracy Half size 32Kbits (3,150): 3.41 misp/KI better than any “implementable” 64Kbits predictor before CBP Robustness to variations of history lengths choices: L(1) in [2,6], L(10) in [125,300] misp. rate < 2.96 misp/KI Geometric series: not a bad formula !! best geometric L(1)=3, L(10)=223, misp/KI best overall {0, 2, 4, 9, 12, 18, 31, 54, 114, 145, 266} misp/KI

26 A case for the OGEHL predictor (3)
Reduce counter width by 1 bit: 49 Kbits would have been a finalist at CBP (3.09)  64 Kbits 4 components OGEHL predictor would have been a finalist at CBP (3.02)  50 Kbits 4 components OGEHL predictor (3-bit) would have been a finalist at CBP (3.21)  768 Kbits 12 components OGEHL predictor 2.25 misp/KI

27 Prediction computation time
3 components: Index computation Counter read Adder tree Does not fit on a single cycle: But may be ahead pipelined !

28 Ahead pipelining a global history branch predictor (principle)
Initiate branch prediction X+1 cycles in advance to provide the prediction in time Use information available: X-block ahead instruction address X-block ahead history To ensure accuracy: Use intermediate path information

29 8 // prediction computations
Practice A B C D bcd Ahead OGEHL: 8 // prediction computations Ha A

30 Ahead Pipelined 64 Kbits OGEHL
3-block: 2.94 misp/KI 4-block: 2.99 misp/KI 5-block: 3.04 misp/KI Not such a huge accuracy loss 

31 A final case for the O-GEHL predictor
delivers state-of-the-art accuracy uses only global information can be ahead pipelined many effective design points Nb of tables, counter width, .. prediction computation logic complexity is low (compared with concurrent predictors) (The End)


Download ppt "The O-GEHL branch predictor"

Similar presentations


Ads by Google