A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction Gabriel H. Loh College of Computing Georgia Tech.

Slides:

Advertisements

Similar presentations

André Seznec Caps Team IRISA/INRIA 1 Looking for limits in branch prediction with the GTL predictor André Seznec IRISA/INRIA/HIPEAC.

Advertisements

Branch prediction Titov Alexander MDSP November, 2009.

Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture.

André Seznec Caps Team IRISA/INRIA 1 The O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.

Clustered Indexing for Conditional Branch Predictors Veerle Desmet Ghent University Belgium.

1 Lecture: Branch Prediction Topics: branch prediction, bimodal/global/local/tournament predictors, branch target buffer (Section 3.3, notes on class webpage)

Computer Architecture 2011 – Branch Prediction 1 Computer Architecture Advanced Branch Prediction Lihu Rappoport and Adi Yoaz.

Neural Methods for Dynamic Branch Prediction Daniel A. Jiménez Calvin Lin Dept. of Computer Science Rutgers University Univ. of Texas Austin Presented.

EECS 470 Branch Prediction Lecture 6 Coverage: Chapter 3.

1 Improving Branch Prediction by Dynamic Dataflow-based Identification of Correlation Branches from a Larger Global History CSE 340 Project Presentation.

1 Applying Perceptrons to Speculation in Computer Architecture Michael Black Dissertation Defense April 2, 2007.

VLSI Project Neural Networks based Branch Prediction Alexander ZlotnikMarcel Apfelbaum Supervised by: Michael Behar, Spring 2005.

1 Lecture 19: Core Design Today: issue queue, ILP, clock speed, ILP innovations.

1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )

Combining Branch Predictors

Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT.

Branch Target Buffers BPB: Tag + Prediction

1 Lecture 8: Instruction Fetch, ILP Limits Today: advanced branch prediction, limits of ILP (Sections , )

CIS 429/529 Winter 2007 Branch Prediction.1 Branch Prediction, Multiple Issue.

1 Lecture 7: Branch prediction Topics: bimodal, global, local branch prediction (Sections )

CS 7810 Lecture 21 Threaded Multiple Path Execution S. Wallace, B. Calder, D. Tullsen Proceedings of ISCA-25 June 1998.

Perceptrons Branch Prediction and its’ recent developments

EECC722 - Shaaban #1 Lec # 9 Fall Conventional & Block-based Trace Caches In high performance superscalar processors the instruction fetch.

Neural Methods for Dynamic Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University.

CS 7810 Lecture 6 The Impact of Delay on the Design of Branch Predictors D.A. Jimenez, S.W. Keckler, C. Lin Proceedings of MICRO

Optimized Hybrid Scaled Neural Analog Predictor Daniel A. Jiménez Department of Computer Science The University of Texas at San Antonio.

Analysis of Branch Predictors

ACSAC’04 Choice Predictor for Free Mongkol Ekpanyapong Pinar Korkmaz Hsien-Hsin S. Lee School of Electrical and Computer Engineering Georgia Institute.

André Seznec Caps Team IRISA/INRIA 1 Analysis of the O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.

1 A New Case for the TAGE Predictor André Seznec INRIA/IRISA.

1 Revisiting the perceptron predictor André Seznec IRISA/ INRIA.

CSCI 6461: Computer Architecture Branch Prediction Instructor: M. Lancaster Corresponding to Hennessey and Patterson Fifth Edition Section 3.3 and Part.

Not- Taken? Taken? The Frankenpredictor Gabriel H. Loh Georgia Tech College of Computing MICRO Dec 5, 2004.

Predicting Conditional Branches With Fusion-Based Hybrid Predictors Gabriel H. Loh Yale University Dept. of Computer Science Dana S. Henry Yale University.

Computer Structure Advanced Branch Prediction

André Seznec Caps Team IRISA/INRIA 1 A 256 Kbits L-TAGE branch predictor André Seznec IRISA/INRIA/HIPEAC.

CS 6290 Branch Prediction. Control Dependencies Branches are very frequent –Approx. 20% of all instructions Can not wait until we know where it goes –Long.

Idealized Piecewise Linear Branch Prediction Daniel A. Jiménez Department of Computer Science Rutgers University.

Prophet/Critic Hybrid Branch Prediction B B B

Branch Prediction Perspectives Using Machine Learning Veerle Desmet Ghent University.

André Seznec Caps Team IRISA/INRIA 1 Analysis of the O-GEHL branch predictor Optimized GEometric History Length André Seznec IRISA/INRIA/HIPEAC.

Fast Path-Based Neural Branch Prediction Daniel A. Jimenez Presented by: Ioana Burcea.

Samira Khan University of Virginia April 12, 2016

CS203 – Advanced Computer Architecture

Computer Structure Advanced Branch Prediction

Computer Architecture Advanced Branch Prediction

CS5100 Advanced Computer Architecture Advanced Branch Prediction

COSC3330 Computer Architecture Lecture 15. Branch Prediction

Samira Khan University of Virginia Dec 4, 2017

15-740/ Computer Architecture Lecture 25: Control Flow II

Looking for limits in branch prediction with the GTL predictor

So far we have dealt with control hazards in instruction pipelines by:

Lecture 17: Core Design Today: implementing core structures – rename, issue queue, bypass networks; innovations for high ILP and clock speed.

Phase Capture and Prediction with Applications

So far we have dealt with control hazards in instruction pipelines by:

Lecture 19: Core Design Today: implementing core structures – rename, issue queue, bypass networks; innovations for high ILP and clock speed.

TAGE-SC-L Again MTAGE-SC

So far we have dealt with control hazards in instruction pipelines by:

So far we have dealt with control hazards in instruction pipelines by:

Pipelining: dynamic branch prediction Prof. Eric Rotenberg

Dynamic Hardware Prediction

So far we have dealt with control hazards in instruction pipelines by:

So far we have dealt with control hazards in instruction pipelines by:

So far we have dealt with control hazards in instruction pipelines by:

So far we have dealt with control hazards in instruction pipelines by:

So far we have dealt with control hazards in instruction pipelines by:

The O-GEHL branch predictor

Samira Khan University of Virginia Mar 6, 2019

Phase based adaptive Branch predictor: Seeing the forest for the trees

Lecture 7: Branch Prediction, Dynamic ILP

Presentation transcript:

A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction Gabriel H. Loh College of Computing Georgia Tech

2005 Sep 20 PACT Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 2 aren’t we done with branch predictors yet? Branch predictors still important Performance for large windows ex. CPR [Akkary et al./MICRO’03] /CFP [Srinivasan et al./ASPLOS’04] Power better bpred reduces wrong-path instructions Throughput wrong-path insts steal resources from other threads in SMT/SOEMT

2005 Sep 20 PACT Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 3 recent bpred research “neural-inspired” predictors perceptron, piecewise-linear, O-GEHL, … very high accuracy relatively high complexity barrier to industrial adoption

2005 Sep 20 PACT Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 4 outline quick synopsis of neural techniques gDAC predictor idea specifics ahead-pipelining results why gDAC works

2005 Sep 20 PACT Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 5 gshare Records previous outcomes given a branch identifier (PC) and a context (BHR) Different contexts may lead to different predictions for the same branch Assumes correlation between context and the outcome foobar hash taken Branch history register (BHR) Pattern History Table (PHT) NT

2005 Sep 20 PACT Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 6 gshare pros and cons simple to implement! variants exist in multiple real processors not scalable for longer history lengths # PHT entries grows exponentially learning time increases if only correlated to one previous branch, still need to train 2 h PHT counters

2005 Sep 20 PACT Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 7 perceptron explicitly locate the source(s) of correlation h2h2 h1h1 h0h0 1 h2h2 h1h1 h0h ! h 1 Table Based Approach Perceptron Approach x i = h i ? 1 : -1 f(X) = (0*x 0 – 1*x 1 + 0*x 2 ) ≥ 0 w0w0 w1w1 w2w2 weights track correlation

2005 Sep 20 PACT Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 8 perceptron predictor … PC … BHR Adder ≥0 Final Prediction Updating the weights: If branch outcome agrees with h i, then increment w i If disagree, decrement w i Downsides: 1.Latency (SRAM lookup, adder tree) 2.Few entries in table  aliasing 3.Linearly separable functions only Magnitude of weight reflects degree of correlation. No correlation makes w i  0

2005 Sep 20 PACT Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 9 path-based neural predictor … PC … BHR ≥0 Final Prediction …… … Perceptron: All weights chosen by PC 0 PBNP: w i selected by PC i (i th oldest PC) Naturally leads to pipelined access Different indexing reduces aliasing Downsides: 1.Latency (SRAM lookup, one adder) 2.Complexity (30-50 stage bpred pipe) 3.Linearly separable functions only

2005 Sep 20 PACT Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 10 piecewise-linear predictor … PC … BHR ≥0 Final Prediction …… … ……… PC Compute m different linear functions in parallel Some linearly inseparable functions can be learned Downsides: 1.Latency (SRAM lookup, one adder, one mux) 2.Complexity (m copies of 50+ stage bpred pipe)

2005 Sep 20 PACT Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 11 goal/scope Neural predictors are very accurate We want same level of performance Neural predictors are complex Large number of adders Very deep pipelines We want to avoid adders We want to keep the pipe short Preferable to use PHTs only

2005 Sep 20 PACT Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 12 idea very long branch history Neural Predictor very long branch history Neural Predictor (Google images “hot dog kobayashi” – 2004 World Record 53½ Hot Dogs)

2005 Sep 20 PACT Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 13 very long very long branch history Predictor PredictorPredictor Predictor Predictor idea PredictorPredictorPredictorPredictor Meta very long branch history (random picture from Google images “hot dog eating”) Make “digesting” a very long branch history easier by dividing up the responsibility!

2005 Sep 20 PACT Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 14 unoptimized gDAC gDAC global history Divide And Conquer Utilizes correlation from only a single history segment BHR[1:s 1 ]BHR[s 1 +1:s 2 ]BHR[s 2 +1:s 3 ]BHR[s 3 +1:s 4 ] PC PHT 1 PHT 2 PHT 3 PHT 4 Prediction Meta gshare-styled predictor

2005 Sep 20 PACT Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 15 fusion gDAC BHR[1:s 1 ]BHR[s 1 +1:s 2 ]BHR[s 2 +1:s 3 ]BHR[s 3 +1:s 4 ] PC PHT 1 PHT 2 PHT 3 PHT 4 Prediction Fusion Table Can combine correlations from multiple segments

2005 Sep 20 PACT Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 16 gDAC BHR[1:s 1 ]BHR[s 1 +1:s 2 ]BHR[s 2 +1:s 3 ]BHR[s 3 +1:s 4 ] PC BM 1 BM 2 BM 3 BM 4 Prediction Fusion Table Better per-segment predictions lead to a better final prediction Bi-Mode style predictor Shared Choice PHT

2005 Sep 20 PACT Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 17 ahead-pipelined gDAC Cycle t-3 PC -3 Segment 1Segment 2Segment 3 Initial Hashing, PHT Bank Select Cycle t-2 Row Decoder Cycle t-1 SRAM Array Access PC -1 Cycle t PC Prediction Branch history from cycles t, t-1 and t-2 does not exist yet Use PC -1 for SRAM column MUX selection Each PHT SRAM organized to output multiple counters (think “cache line”); use current PC to select one Branch history from cycles t, t-1 and t-2 now available

2005 Sep 20 PACT Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 18 comment on ahead-pipelining Branch predictors composed of only PHTs simple SRAMs easily ahead-pipelined Seznec showed AP of 2bcgskew [ISCA’02], fetch in general [ISCA’03] Jiménez showed AP-like gshare.fast [HPCA’03]

2005 Sep 20 PACT Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 19 simulation/configs Standard stuff SimpleScalar/Alpha (MASE), SPEC2k-INT, SimPoint CPU Config similar to PWL study [Jiménez/ISCA’05] gDAC vs. gshare, perceptron, PBNP, PWL gDAC configs vary 2-3 segments history length of 2KB to 128KB neural advantage: gDAC tables constrained to power-of-two entries, neural can use arbitrary sizes

2005 Sep 20 PACT Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 20 misprediction rates 2KB: About as accurate as original perceptron 8KB: Beats original perceptron 32KB: As accurate as path-based neural pred Piecewise Linear predictor just does really well

2005 Sep 20 PACT Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 21 performance As accurate as perceptron, but better latency  higher IPC gDAC is less accurate than 16KB, but latency starting to matter Latency difference allows gDAC to even catch up with PWL in performance Goal achieved: Neural-class performance PHT-only complexity

2005 Sep 20 PACT Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 22 so it works, but why? correlation locality correlation redundancy correlation recovery use perceptron as vehicle of analysis – it explicitly assigns a correlation strength to each branch

2005 Sep 20 PACT Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 23 correlation locality parser Distinct clusters/bands of correlation Segmenting (at the right places) should not disrupt clusters of correlation

2005 Sep 20 PACT Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 24 correlation locality gcc

2005 Sep 20 PACT Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 25 correlation redundancy Using only the correlation from a few branches yields almost as much info as using all branches Therefore the correlations detected in the other weights are redundant!

2005 Sep 20 PACT Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 26 correlation recovery cross-segment correlation may exist P1P1P1P1 M 2,3 M 1,(2,3) P2P2P2P2 P3P3P3P3 Prediction Selection-based Meta can only use correlation from one segment P1P1P1P1 P2P2P2P2 P3P3P3P3 fusion Prediction Fusion can (indirectly) use correlation from all segments Fusion gDAC beats selection gDAC by 4%

2005 Sep 20 PACT Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 27 orthogonality could use these ideas in other predictors segmented history PPM predictor segmented, geometric history lengths some “segments” could use local history, prophet “future” history, or anything else may be other ways to exploit the general phenomena correlation locality, redundancy and recovery

2005 Sep 20 PACT Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 28 summary contributions PHT-based long-history predictor achieves goals of neural-accuracy, PHT complexity ahead-pipelined organization analysis of segmentation+fusion on correlation Contact:

2005 Sep 20 PACT Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 29 BACKUP SLIDES

2005 Sep 20 PACT Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 30 Power Neural predictor update: lots of separate small tables; extra decoders, harder to bank All of the adders Timing critical for perceptron – power hungry Not as bad for PBNP (use small RCAs) PWL (multiplies # adders considerably) Checkpointing overhead for PBNP, PWL Need to store partial sums Per branch!

2005 Sep 20 PACT Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 31 Power Density/Thermals gDAC: can break up tables between prediction bits and hysteresis bits (like EV8) Neural must use all bits FetchDecodeRenameCommit … Physical separation reduces power density/thermals Similar for O-GEHL, PPM

2005 Sep 20 PACT Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 32 linear (in)separability Linearly separable only Linearly separable between segments Linearly separable within segments Linearly inseparable This does The best

2005 Sep 20 PACT Loh - A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction 33 per-benchmark accuracy (128KB)