Presentation is loading. Please wait.

Presentation is loading. Please wait.

And Branch Prediction Machine Learning You can follow along at:

Similar presentations


Presentation on theme: "And Branch Prediction Machine Learning You can follow along at:"— Presentation transcript:

1 And Branch Prediction Machine Learning You can follow along at:
Machine Learning And Branch Prediction IN ORDER OF SPEAKING: Guru Nishok Radhakrishnan, Crystin Rodrick, and Anuj Wali

2 INTRODUCTION Introduction Previous research Simulation
OVERVIEW Introduction Previous research Simulation Machine learning Design and Implementation Future Work

3 INTRODUCTION Different number of branch predictors.
Decreasing HW costs , Feasible branch predictors. Machine Learning – Recognizing patterns within the data.

4 PERFORMANCE OVERVIEW Bimodal Branch Prediction:
Used in non-MMX Intel Pentium  processor Saturates at 93.5% correct REF [4]

5 LOCAL BRANCH PREDICTION: Separate history buffer.
Pattern history table may or may not be shared. Used in Pentium II, Pentium III. Saturates at 97.1% correct. REF [4]

6 GLOBAL BRANCH PREDICTION:
Shared history records Advantage: Correlation between the different conditional jumps is a part of making the predictions. Disadvantage: History gives irrelevant information if the conditional jumps are uncorrelated. Used in Intel Pentium M, Core ,Core 2.

7 Saturates at 96% correct. REF [4]

8 Neural Branch Predictor:
Advantage: Exploit long histories which requires linear resource growth. Disadvantage: Perceptron predictor has a high Latency. Used in AMD Ryzen Processor

9 PREVIOUS RESEARCH: “Dynamic Branch Prediction using Machine Learning Algorithms”, Kedar Bellare, Pallika Kanani and Shiraj Sen Department of Computer Science, University of Massachusetts Amherst May 17, 2006. p=rep1&type=pdf “Branch Prediction with Bayesian Networks”, Jeremy Singer, Gavin Brown, and Ian Watson ,University of Manchester, UK 7b1fa3d3.pdf

10 “A 2-Clock-Cycle Naïve Bayes classifier for Dynamic branch prediction in pipelined RISC Microprocessors” , Itaru Hida, Masayuki Ikebe, Tetsuya Asai, and Masato Motomura, Graduate School of Information Science and Technology.

11 MOTIVATION Two level predictor - High overhead.
Neural Predictors is already commercially available. Overcome the issues through machine learning concepts.

12 PARSING THE TRACE FILES
SIMULATION PARSING THE TRACE FILES BRANCH HISTORY BRANCH PREDICTION

13 WHAT IS MACHINE LEARNING?
ARTIFICIAL INTELLIGENCE No rule set, they create their own rules. DEEP LEARNING Multiple layers MACHINE LEARNING Training sets

14 Types of Machines Learning
ALGORITHMS Reinforcement Learning Unsupervised Learning Semi-Supervised Learning Supervised Learning

15 Algorithm Naive Bayes

16 LoGISTIC REGRESSION Created in the 50’s. Directly learns P(Y|X)

17 AlGORITHM hw(x) =g(wTx) = g(w0x0 +w1x1 +···+wnxn) g(z) = 1/ (1 + e-z)
Logistic Regression hw(x) =g(wTx) = g(w0x0 +w1x1 +···+wnxn) g(z) = 1/ (1 + e-z) Thus, hw(x)=1/1+e(-wT x) w T x should be large negative values for negative instances   w T x should be large positive values for positive instances. If we assume a threshold of 0.5, then we predict 1, else 0.

18 Logistic FUNction J(w) = − y(i) log(hw(x)) + (1 − y(i)) log(1 − hw(x))
wj =wj − α (∂/∂wj)*J(w)

19 DUCK TYPING VS

20 EXAMPLE DATASET QUACK WADDLE TYPE 1 DUCK TOY

21 PROBABILITIES P(QUACK=1|DUCK) = 5/7 = 0.71 P(QUACK=1|TOY) = 2/7 = 0.29
P(WADDLE=1|DUCK) = 2/5 = 0.4 P(WADDLE=1|TOY) = 3/5 = 0.6 P(DUCK) = 5/10 = 0.5 P(TOY) = 5/10 = 0.5

22 NAIVE BAYES PREDICT BY FINDING ARGMAX OF P(Y|WADDLE=1,QUACK=1) DUCK
SOLUTION PREDICT BY FINDING ARGMAX OF P(Y|WADDLE=1,QUACK=1) DUCK P(0.5)*P(0.71)*P(0.4) = 0.14 TOY P(0.5)*P(0.29)*P(0.6) = 0.09 THEREFORE, WE PREDICT DUCK

23 LOGISTIC REGRESSION INITIALIZE W, 0.33 SPREAD OVER ALL WEIGHTS
RUN THROUGH INITIALIZE W, 0.33 SPREAD OVER ALL WEIGHTS AFTER FIRST ITERATION 1 / (1 + e(-( (1) + 0.3*(1)) wj =wj − α (∂/∂wj)*J(w) Once you get the optimal weights, plug in the X values to get your y

24 DESIGN Naive Bayes Why Naive Bayes? Faster than other regression/classification models Relatively less complex circuitry Amount of physical space needed is lower

25 DESIGN What do we need to build one? Set of known attributes
Naive Bayes What do we need to build one? Set of known attributes Conditional Probability Table Posterior Probability

26 DESIGN Set of known attributes: Last 30 branch outcomes
Naive Bayes Set of known attributes: Last 30 branch outcomes Maintained as a queue

27 DESIGN CPT TABLE Bimodal Counters instead of Probabilities
Naive Bayes CPT TABLE Bimodal Counters instead of Probabilities Updated after each branch

28 DESIGN Posterior Probabilities P(y = 0 | X) P(y = 1 | X)
Naive Bayes Posterior Probabilities P(y = 0 | X) P(y = 1 | X) Made faster using Look-Up-Table

29 DESIGN Naive Bayes PROCESS FLOW

30 DESIGN PROCESSOR ARCHITECTURE LatticeMico32 6 stage pipeline
Naive Bayes PROCESSOR ARCHITECTURE LatticeMico32 6 stage pipeline WHERE DOES IT FIT IN THE PIPELINE?

31 DESIGN Naive Bayes EXPECTED MODEL PARAMETERS Counter Size
History Length Figures taken from [1]

32 DESIGN Naive Bayes EXPECTED RESULTS
Performance Measurement: Misprediction Rate (%) Performs better than others Figure taken from [1]

33 POSSIBLE WORK ISSUES Not much research available Feasibility
Logistic Regression ISSUES Not much research available Feasibility POSSIBLE IMPLEMENTATION MODEL Same Attribute Set Performance similar to Perceptron

34 REFERENCES 1) “A 2-Clock-Cycle Naïve Bayes classifier for Dynamic branch prediction in pipelined RISC Microprocessors” , Itaru Hida, Masayuki Ikebe, Tetsuya Asai, and Masato Motomura, Graduate School of Information Science and Technology. 2) “Dynamic Branch Prediction using Machine Learning Algorithms”, Kedar Bellare, Pallika Kanani and Shiraj Sen Department of Computer Science, University of Massachusetts Amherst May 17, 2006. 3) “Branch Prediction with Bayesian Networks”, Jeremy Singer, Gavin Brown, and Ian Watson ,University of Manchester, UK. 4)”Combining Branch Predictors”, Scott McFarling , June 1993.

35 THANK YOU


Download ppt "And Branch Prediction Machine Learning You can follow along at:"

Similar presentations


Ads by Google