A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction Gabriel H. Loh College of Computing Georgia Tech.

A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction Gabriel H. Loh College of Computing Georgia Tech

2005 Sep 20 PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 2 aren’t we done with branch predictors yet? Branch predictors still important Performance for large windows ex. CPR [Akkary et al./MICRO’03] /CFP [Srinivasan et al./ASPLOS’04] Power better bpred reduces wrong-path instructions Throughput wrong-path insts steal resources from other threads in SMT/SOEMT

2005 Sep 20 PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 3 recent bpred research “neural-inspired” predictors perceptron, piecewise-linear, O-GEHL, … very high accuracy relatively high complexity barrier to industrial adoption

2005 Sep 20 PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 4 outline quick synopsis of neural techniques gDAC predictor idea specifics ahead-pipelining results why gDAC works

2005 Sep 20 PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 5 gshare Records previous outcomes given a branch identifier (PC) and a context (BHR) Different contexts may lead to different predictions for the same branch Assumes correlation between context and the outcome 110010 foobar hash taken Branch history register (BHR) Pattern History Table (PHT) 011001 NT

2005 Sep 20 PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 6 gshare pros and cons simple to implement! variants exist in multiple real processors not scalable for longer history lengths # PHT entries grows exponentially learning time increases if only correlated to one previous branch, still need to train 2 h PHT counters

2005 Sep 20 PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 7 perceptron explicitly locate the source(s) of correlation h2h2 h1h1 h0h0 1 h2h2 h1h1 h0h0 1 0 0 1 1 0 0 ! h 1 Table Based Approach Perceptron Approach x i = h i ? 1 : -1 f(X) = (0*x 0 – 1*x 1 + 0*x 2 ) ≥ 0 w0w0 w1w1 w2w2 weights track correlation

2005 Sep 20 PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 8 perceptron predictor … PC … BHR Adder ≥0 Final Prediction Updating the weights: If branch outcome agrees with h i, then increment w i If disagree, decrement w i Downsides: 1.Latency (SRAM lookup, adder tree) 2.Few entries in table  aliasing 3.Linearly separable functions only Magnitude of weight reflects degree of correlation. No correlation makes w i  0

2005 Sep 20 PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 9 path-based neural predictor … PC … BHR ≥0 Final Prediction + + + + + …… … Perceptron: All weights chosen by PC 0 PBNP: w i selected by PC i (i th oldest PC) Naturally leads to pipelined access Different indexing reduces aliasing Downsides: 1.Latency (SRAM lookup, one adder) 2.Complexity (30-50 stage bpred pipe) 3.Linearly separable functions only

2005 Sep 20 PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 10 piecewise-linear predictor … PC … BHR ≥0 Final Prediction + + + + + …… … + + + + + + + + + + + + + + + ……… PC Compute m different linear functions in parallel Some linearly inseparable functions can be learned Downsides: 1.Latency (SRAM lookup, one adder, one mux) 2.Complexity (m copies of 50+ stage bpred pipe)

2005 Sep 20 PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 11 goal/scope Neural predictors are very accurate We want same level of performance Neural predictors are complex Large number of adders Very deep pipelines We want to avoid adders We want to keep the pipe short Preferable to use PHTs only

2005 Sep 20 PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 12 idea very long branch history Neural Predictor very long branch history Neural Predictor (Google images “hot dog kobayashi” – 2004 World Record 53½ Hot Dogs)

2005 Sep 20 PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 13 very long very long branch history Predictor PredictorPredictor Predictor Predictor idea PredictorPredictorPredictorPredictor Meta very long branch history (random picture from Google images “hot dog eating”) Make “digesting” a very long branch history easier by dividing up the responsibility!

2005 Sep 20 PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 14 unoptimized gDAC gDAC global history Divide And Conquer Utilizes correlation from only a single history segment BHR[1:s 1 ]BHR[s 1 +1:s 2 ]BHR[s 2 +1:s 3 ]BHR[s 3 +1:s 4 ] PC PHT 1 PHT 2 PHT 3 PHT 4 Prediction Meta gshare-styled predictor

2005 Sep 20 PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 15 fusion gDAC BHR[1:s 1 ]BHR[s 1 +1:s 2 ]BHR[s 2 +1:s 3 ]BHR[s 3 +1:s 4 ] PC PHT 1 PHT 2 PHT 3 PHT 4 Prediction Fusion Table Can combine correlations from multiple segments

2005 Sep 20 PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 16 gDAC BHR[1:s 1 ]BHR[s 1 +1:s 2 ]BHR[s 2 +1:s 3 ]BHR[s 3 +1:s 4 ] PC BM 1 BM 2 BM 3 BM 4 Prediction Fusion Table Better per-segment predictions lead to a better final prediction Bi-Mode style predictor Shared Choice PHT

2005 Sep 20 PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 17 ahead-pipelined gDAC Cycle t-3 PC -3 Segment 1Segment 2Segment 3 Initial Hashing, PHT Bank Select Cycle t-2 Row Decoder Cycle t-1 SRAM Array Access PC -1 Cycle t PC Prediction Branch history from cycles t, t-1 and t-2 does not exist yet Use PC -1 for SRAM column MUX selection Each PHT SRAM organized to output multiple counters (think “cache line”); use current PC to select one Branch history from cycles t, t-1 and t-2 now available

2005 Sep 20 PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 18 comment on ahead-pipelining Branch predictors composed of only PHTs simple SRAMs easily ahead-pipelined Seznec showed AP of 2bcgskew [ISCA’02], fetch in general [ISCA’03] Jiménez showed AP-like gshare.fast [HPCA’03]

2005 Sep 20 PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 19 simulation/configs Standard stuff SimpleScalar/Alpha (MASE), SPEC2k-INT, SimPoint CPU Config similar to PWL study [Jiménez/ISCA’05] gDAC vs. gshare, perceptron, PBNP, PWL gDAC configs vary 2-3 segments history length of 21 @ 2KB to 86 @ 128KB neural advantage: gDAC tables constrained to power-of-two entries, neural can use arbitrary sizes

2005 Sep 20 PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 20 misprediction rates 2KB: About as accurate as original perceptron 8KB: Beats original perceptron 32KB: As accurate as path-based neural pred Piecewise Linear predictor just does really well

2005 Sep 20 PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 21 performance As accurate as perceptron, but better latency  higher IPC gDAC is less accurate than path-neural @ 16KB, but latency starting to matter Latency difference allows gDAC to even catch up with PWL in performance Goal achieved: Neural-class performance PHT-only complexity

2005 Sep 20 PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 22 so it works, but why? correlation locality correlation redundancy correlation recovery use perceptron as vehicle of analysis – it explicitly assigns a correlation strength to each branch

2005 Sep 20 PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 23 correlation locality parser Distinct clusters/bands of correlation Segmenting (at the right places) should not disrupt clusters of correlation

2005 Sep 20 PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 24 correlation locality gcc

2005 Sep 20 PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 25 correlation redundancy Using only the correlation from a few branches yields almost as much info as using all branches Therefore the correlations detected in the other weights are redundant!

2005 Sep 20 PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 26 correlation recovery cross-segment correlation may exist P1P1P1P1 M 2,3 M 1,(2,3) P2P2P2P2 P3P3P3P3 Prediction Selection-based Meta can only use correlation from one segment P1P1P1P1 P2P2P2P2 P3P3P3P3 fusion Prediction Fusion can (indirectly) use correlation from all segments Fusion gDAC beats selection gDAC by 4%

2005 Sep 20 PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 27 orthogonality could use these ideas in other predictors segmented history PPM predictor segmented, geometric history lengths some “segments” could use local history, prophet “future” history, or anything else may be other ways to exploit the general phenomena correlation locality, redundancy and recovery

2005 Sep 20 PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 28 summary contributions PHT-based long-history predictor achieves goals of neural-accuracy, PHT complexity ahead-pipelined organization analysis of segmentation+fusion on correlation Contact: loh@cc.gatech.edu http://www.cc.gatech.edu/~loh

2005 Sep 20 PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 29 BACKUP SLIDES

2005 Sep 20 PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 30 Power Neural predictor update: lots of separate small tables; extra decoders, harder to bank All of the adders Timing critical for perceptron – power hungry Not as bad for PBNP (use small RCAs) PWL (multiplies # adders considerably) Checkpointing overhead for PBNP, PWL Need to store 30-50+ partial sums Per branch!

2005 Sep 20 PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 31 Power Density/Thermals gDAC: can break up tables between prediction bits and hysteresis bits (like EV8) Neural must use all bits FetchDecodeRenameCommit … Physical separation reduces power density/thermals Similar for O-GEHL, PPM

2005 Sep 20 PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 32 linear (in)separability Linearly separable only Linearly separable between segments Linearly separable within segments Linearly inseparable This does The best

2005 Sep 20 PACT2005 - Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 33 per-benchmark accuracy (128KB)

A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction Gabriel H. Loh College of Computing Georgia Tech.

Similar presentations

Presentation on theme: "A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction Gabriel H. Loh College of Computing Georgia Tech."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction Gabriel H. Loh College of Computing Georgia Tech.

Similar presentations

Presentation on theme: "A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction Gabriel H. Loh College of Computing Georgia Tech."— Presentation transcript:

Similar presentations

About project

Feedback