Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture.

Similar presentations


Presentation on theme: "Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture."— Presentation transcript:

1 Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture University Politècnica de Catalunya Presented by Danyao Wang ECE1718, Fall 2008 ISCA '98

2 2 Overview Branch prediction background Dynamic branch predictors Dynamic history-length fitting (DHLF) –Without context switches –With context switches Results Conclusion

3 3 Why branch prediction? Superscalar processors with deep pipelines –Intel Core 2 Duo: 14 stages –AMD Athlon 64: 12 stages –Intel Pentium 4: 31 stages Many cycles before branch is resolved –Wasting time if wait… –Would be good if can do some useful work… Branch prediction!

4 4 What does it do? sub r1, r2, r3 bne r1, r0, L1 add r4, r5, r6 … L1:add r4, r7, r8 sub r9, r4, r2 fetchdecodesub fetchdecodesub fetchdecodebne fetchdecodeadd Execute speculatively Predict taken. Fetch from L1 Branch resolved Time Branch fetched Validate prediction: Correct

5 5 What happens when mispredicted? sub r1, r2, r3 bne r1, r0, L1 add r4, r5, r6 … L1:add r4, r7, r8 sub r9, r4, r2 fetchdecodesub fetchdecodesub fetchdecodebne fetchdecodeadd Execute speculatively Predict taken. Fetch from L1 Branch resolved Time Branch fetched Validate prediction: Incorrect! squash

6 6 How to predict branches? Statically at compile time –Simple hardware –Not accurate enough… Dynamically at execution time –Hardware predictors Last-outcome predictor Saturation counter Pattern predictor Tournament predictor More Complex More Accurate

7 7 Last-Outcome Branch Predictor Simplest dynamic branch predictor Branch prediction table with 1-bit entries Intuition: history repeats itself 2 N entries PC lower N bits of PC Branch Prediction Table index 1-bit Prediction: T or NT -Read at Fetch -Write on misprediction

8 8 Saturation Counter Predictor Observation: branches highly bimodal n-bit saturation counter –Hysteresis –n-bit entries in branch prediction table Pred. Taken Pred. Not-Taken T TT T N N N N WEAK bias Strong bias e.g. 2-bit bimodal predictor

9 9 Pattern Predictors Near-by branches often correlate Looks for patterns in branch history –Branch History Register (BHR): m most recent branch outcomes 2 N entries PC lower n bits of PC Branch Prediction Table N-bit index saturation counter BHR m-bit history f Two-Level Predictor

10 10 Tournament Predictor No one-size-suits-all predictor Dynamically choose among different predictors Predictor A Predictor B PC Predictor C Chooser or metapredictor

11 11 What is the best predictor? Optimal Better

12 12 Observations Predictor performance depends on history length Optimal history length differs for programs Predictors with fixed history length underperforming potential … dynamic history length?

13 Dynamic History-Length Fitting (DHLF)

14 14 Intuition Tournament predictor –Picks best out of many predictors –Spatial multiplexing –Area cost … DHLF: time multiplexing –Try different history lengths during execution –Adapt history length to code –Hope to find the best one

15 15 2-Level Predictor Revisited Index = f(PC, BHR) gshare, f = xor, m < n 2-bit saturation counter 2 n entries PC lower n bits of PC Branch Prediction Table n-bit index saturation counter BHR m-bit history f Predetermined Figure out dynamically

16 16 DHLF Approach Current history length Best so far length Misprediction counter Branch counter Table of measured misprediction rates per length –Initialized to zero Sampling at fixed intervals (step size) –Try new length: get MR –Adjust if worse than best seen before –Move to a random length if length has not changed for a while Avoids local minima

17 17 DHLF Examples Index = 12 bits step = 16K Optimal

18 18 Experimental Methodology SPECint95 gshare and dhlf-gshare Trace-driven simulation Simulated up to 200M conditional branches Branch history register & pattern history table immediately updated with the true outcome

19 19 DHLF Performance Area overhead –Index length = 10; step size = 16K; overhead = 7% –Index length = 16; step size = 16K; overhead = 0.02% Better

20 20 Optimization Strategies Step size –Small: learns faster Has to be big enough for meaningful misprediction stats –Big: learns slower Change length incrementally –Test as many lengths as possible Warm-up period –No MR count for 1 interval after length change

21 21 Context Switches Branch prediction table trashed periodically Lower prediction accuracy immediately after a context switch Context switch frequency affects optimal history length

22 22 Impact on Misprediction Rate Better gshare. Index = 16 bits Context-switch distance: # branches executed between context switches

23 23 Coping with Context Switches Upon context switch –Discard current misprediction counter –Save current predictor data misprediction table current history length Approx. 221 bits for 16-bit index, step = 16K, 13 bit misprediction counter Returning from a context switch –Warm-up: no MR counter for 1 interval

24 24 DHLF with Context Switches Misprediction rate Better x dhlf-gshare with step value = 16K gshare with all possible history length Branch prediction table flush every 70K instructions to simulate context switch.

25 25 Contributions Dynamically finds near-optimal history lengths Performs well for programs with different branch behaviours Performs well under context switches Can be applied to any two-level branch predictor Small area overhead

26 Backup Slides

27 27 DHLF Performance: SPECint95 dhlf-share; step size = 16K. Compared to all possible history lengths (no context switch) Better

28 28 DHLP with Context Switches Better dhlf-gshare; step size = 16K; context-switch distance = 70K

29 29 dhlf-gskew Step value = 16K. Compared to all history lengths for gskew, Better

30 30 dhlf-gskew with Context Switch Step size = 16K; Context-switch distance = 70K. Better

31 31 DHLF Structure Run next interval Misprediction table N entries 0 1 N step dynamic branches Initial history length branch counter misprediction counter current misprediction > min achieved? ptr. to min. misprediction count ptr. to entry for current history length Yes Adjust history length No DHLF Data Structure

32 32 Questions Is fixed context switch distance realistic? Does updating the PHT with true branch data immediately affect results? –Previous studies show little impact due to this


Download ppt "Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture."

Similar presentations


Ads by Google