Download presentation

Presentation is loading. Please wait.

Published byDeanna Etchells Modified about 1 year ago

1
Computer Science Department University of Central Florida Adaptive Information Processing: An Effective Way to Improve Perceptron Predictors Hongliang GaoHuiyang Zhou

2
University of Central Florida2 Information system model Source (program) Information: Br address Br history Br type Other run time info. Information vector Information processor Predictor FSMs Perceptrons [2] Correlation feedback Information system model by Chen et. al. [ASPLOS-VII]. Key observations –Shortcomings: Fixed information vector while different workloads/branches need different information data. –Perceptron weights Correlation Assemble information vector to maximize correlation Our contribution –Re-assemble the information vector based on correlation (weights) –Performed at a coarse grain, so it is not latency critical

3
University of Central Florida3 Adaptive Information Processing Profile-directed adaptation Correlation-directed adaptation Perceptron Predictor Fixed GHRLHRPC Information Output: n bits n < m MUX … Controls Inputs: m bits

4
University of Central Florida4 Profile-directed adaptation … w0 table wt table 1 wt table 2 wt table N Prediction = sign(y) y GHR LHR table Index 0 Index 1 Index 2 Index N weights PC Information Information vector 4 4 … type MUX type Workload Detector type Update Logic 4 MAC-RHSP predictor [5]

5
University of Central Florida5 P[0:3] P[4:7] P[8:11] P[12:15] P[16:19] G[0:3] G[4:7] …… L[0:3] P[0:3] L[0:3] G[0:3] G[4:7] G[8:11] G[12:P15] G[16:19] …… L[4:7] P[0:3] L[0:3] G[0:3] G[4:7] G[8:11] G[12:P15] G[16:19] …… G[56:59^60:63] L[0:3] L[4:7] G[0:3^4:7] G[8:11^12:15] G[16:19^20:23] G[24:27^28:31] G[32:35^36:39] …… G[80:83^84:87] Profile-directed adaptation Workload Detector type MUX type L[0:3] P[0:3] P[0:3] P[0:3] … FP INT MM SERV INT FP MM SERV MUX type L[4:7] L[0:3] L[0:3] P[4:7] MUX type G[ ^ ] G[^] L[4:7] L[0:3] Table 1Table 2Table N G[0:3] XOR G[4:7]

6
University of Central Florida6 Workload Detector Detection criteria – SERV : a large number of static branches – FP : a small number of static branches, a high number of floating point operation, and a high number of instructions using XMM registers – MM : a medium number of static branches, a medium number of floating point operation, and a medium number of instructions using XMM registers – INT : default Workload Detector type Source (program) Run time info: # of fp insts XMM accesses # of static br

7
University of Central Florida7 Correlation-directed adaptation … w0 table wt table 1 wt table 2 wt table N Prediction = sign(y) y GHR LHR table Index 0 Index 1 Index 2 Index N weights PC Information Information vector type c1 Information feeding logic MUX2 c2 cN (c1, c2, c3, …) c1c2 cN Adaptation Logic MUX type Workload Detector type Update Logic MAC-RHSP predictor

8
University of Central Florida8 Correlation-directed adaptation … GHR LHR table Index 1 Index 2 Index N weights PC Information Information vector 4 4 type 4 4 Information feeding logic MUX2 4 4 c1 c2 (c1, c2, c3, …) c1c2 cN Adaptation Logic MUX type (INT) Workload Detector type PCLHR Sum1Sum2SumN Large Sum1 => strong correlation with PC bits Small SumN => weak correlation with certain GHR bits cN ‘PC’ => more PC bits Table 1 Table 2Table N cN ’PC’ GHR PC PC

9
University of Central Florida9 Overall scheme … Loop branch predictor PC tagloop counttaken count confidenceLRU PC Bias branch predictor GHR LHR table PC MAC perceptron predictor workload detector runtime info. initial (predict NT) always NT (predict NT) always Taken (predict T) Not biased (use other predictors) loop hit loop prediction loop hit NT Taken loop prediction prediction

10
University of Central Florida10 Summary Observations –Different workloads/branches need different information. –Perceptron weights Correlation Contributions –Profile-directed adaptation –Correlation-directed adaptation –Reducing aliasing from bias and loop branches Result –Significant improvement

11
Computer Science Department University of Central Florida Thank you and Questions?

12
University of Central Florida12 References [1] I. K. Chen, J. T. Coffey, and T. N. Mudge, “Analysis of branch prediction via data compression”, Proc. of the 7 th Int. Conf. on Arch. Support for Programming Languages and Operating Systems (ASPLOS-VII), [2] D. Jimenez and C. Lin, “Dynamic branch prediction with perceptrons”, Proc. of the 7 th Int. Symp. on High Perf. Comp. Arch (HPCA-7), [3] D. Jimenez and C. Lin, “Neural methods for dynamic branch prediction”, ACM Trans. on Computer Systems, [4] S. MacFarling, “Combining branch predictors”, Technical Report, DEC, [5] A. Seznec, “Revisiting the perceptron predictor”, Technical Report, IRISA, [6] T.-Y. Yeh and Y. Patt, “Alternative implementations of two-level adaptive branch prediction”, Proc. of the 22nd Int. Symp. on Comp. Arch (ISCA-22), 1995.

13
University of Central Florida13 Predictor configuration COMPONENTCONFIGURATIONCOST Bias branch predictor2293 entries2293 x 2 = 4586 bits. Loop branch predictor24 entries, 8-way set associative1344 bits InformationPC GHR: 100 bits LHR: 8 bits, 63 entries * 63 = 636 bits MAC perceptron predictor W0 table : 61 entries, 8 bits each Other table sizes: 63, 55, 53, 53, 51, 49, 43, 41, 41, 39, 37, 37, and 35. Total MAC entries : 597 Each entry: 16 weights, 6 bits each Control bits : 2 bits each entry 61 * * 16 * * 2 = bits AdaptationAdaptation interval : * 2 conditional branches 22 bits Workload detectionInterval: (instructions / conditional branches) 82 bits Total : bits (less than 64 x = bits)

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google