Download presentation
Presentation is loading. Please wait.
Published byAde Dharmawijaya Modified over 6 years ago
1
And Branch Prediction Machine Learning You can follow along at:
Machine Learning And Branch Prediction IN ORDER OF SPEAKING: Guru Nishok Radhakrishnan, Crystin Rodrick, and Anuj Wali
2
INTRODUCTION Introduction Previous research Simulation
OVERVIEW Introduction Previous research Simulation Machine learning Design and Implementation Future Work
3
INTRODUCTION Different number of branch predictors.
Decreasing HW costs , Feasible branch predictors. Machine Learning – Recognizing patterns within the data.
4
PERFORMANCE OVERVIEW Bimodal Branch Prediction:
Used in non-MMX Intel Pentium processor Saturates at 93.5% correct REF [4]
5
LOCAL BRANCH PREDICTION: Separate history buffer.
Pattern history table may or may not be shared. Used in Pentium II, Pentium III. Saturates at 97.1% correct. REF [4]
6
GLOBAL BRANCH PREDICTION:
Shared history records Advantage: Correlation between the different conditional jumps is a part of making the predictions. Disadvantage: History gives irrelevant information if the conditional jumps are uncorrelated. Used in Intel Pentium M, Core ,Core 2.
7
Saturates at 96% correct. REF [4]
8
Neural Branch Predictor:
Advantage: Exploit long histories which requires linear resource growth. Disadvantage: Perceptron predictor has a high Latency. Used in AMD Ryzen Processor
9
PREVIOUS RESEARCH: “Dynamic Branch Prediction using Machine Learning Algorithms”, Kedar Bellare, Pallika Kanani and Shiraj Sen Department of Computer Science, University of Massachusetts Amherst May 17, 2006. p=rep1&type=pdf “Branch Prediction with Bayesian Networks”, Jeremy Singer, Gavin Brown, and Ian Watson ,University of Manchester, UK 7b1fa3d3.pdf
10
“A 2-Clock-Cycle Naïve Bayes classifier for Dynamic branch prediction in pipelined RISC Microprocessors” , Itaru Hida, Masayuki Ikebe, Tetsuya Asai, and Masato Motomura, Graduate School of Information Science and Technology.
11
MOTIVATION Two level predictor - High overhead.
Neural Predictors is already commercially available. Overcome the issues through machine learning concepts.
12
PARSING THE TRACE FILES
SIMULATION PARSING THE TRACE FILES BRANCH HISTORY BRANCH PREDICTION
13
WHAT IS MACHINE LEARNING?
ARTIFICIAL INTELLIGENCE No rule set, they create their own rules. DEEP LEARNING Multiple layers MACHINE LEARNING Training sets
14
Types of Machines Learning
ALGORITHMS Reinforcement Learning Unsupervised Learning Semi-Supervised Learning Supervised Learning
15
Algorithm Naive Bayes
16
LoGISTIC REGRESSION Created in the 50’s. Directly learns P(Y|X)
17
AlGORITHM hw(x) =g(wTx) = g(w0x0 +w1x1 +···+wnxn) g(z) = 1/ (1 + e-z)
Logistic Regression hw(x) =g(wTx) = g(w0x0 +w1x1 +···+wnxn) g(z) = 1/ (1 + e-z) Thus, hw(x)=1/1+e(-wT x) w T x should be large negative values for negative instances w T x should be large positive values for positive instances. If we assume a threshold of 0.5, then we predict 1, else 0.
18
Logistic FUNction J(w) = − y(i) log(hw(x)) + (1 − y(i)) log(1 − hw(x))
wj =wj − α (∂/∂wj)*J(w)
19
DUCK TYPING VS
20
EXAMPLE DATASET QUACK WADDLE TYPE 1 DUCK TOY
21
PROBABILITIES P(QUACK=1|DUCK) = 5/7 = 0.71 P(QUACK=1|TOY) = 2/7 = 0.29
P(WADDLE=1|DUCK) = 2/5 = 0.4 P(WADDLE=1|TOY) = 3/5 = 0.6 P(DUCK) = 5/10 = 0.5 P(TOY) = 5/10 = 0.5
22
NAIVE BAYES PREDICT BY FINDING ARGMAX OF P(Y|WADDLE=1,QUACK=1) DUCK
SOLUTION PREDICT BY FINDING ARGMAX OF P(Y|WADDLE=1,QUACK=1) DUCK P(0.5)*P(0.71)*P(0.4) = 0.14 TOY P(0.5)*P(0.29)*P(0.6) = 0.09 THEREFORE, WE PREDICT DUCK
23
LOGISTIC REGRESSION INITIALIZE W, 0.33 SPREAD OVER ALL WEIGHTS
RUN THROUGH INITIALIZE W, 0.33 SPREAD OVER ALL WEIGHTS AFTER FIRST ITERATION 1 / (1 + e(-( (1) + 0.3*(1)) wj =wj − α (∂/∂wj)*J(w) Once you get the optimal weights, plug in the X values to get your y
24
DESIGN Naive Bayes Why Naive Bayes? Faster than other regression/classification models Relatively less complex circuitry Amount of physical space needed is lower
25
DESIGN What do we need to build one? Set of known attributes
Naive Bayes What do we need to build one? Set of known attributes Conditional Probability Table Posterior Probability
26
DESIGN Set of known attributes: Last 30 branch outcomes
Naive Bayes Set of known attributes: Last 30 branch outcomes Maintained as a queue
27
DESIGN CPT TABLE Bimodal Counters instead of Probabilities
Naive Bayes CPT TABLE Bimodal Counters instead of Probabilities Updated after each branch
28
DESIGN Posterior Probabilities P(y = 0 | X) P(y = 1 | X)
Naive Bayes Posterior Probabilities P(y = 0 | X) P(y = 1 | X) Made faster using Look-Up-Table
29
DESIGN Naive Bayes PROCESS FLOW
30
DESIGN PROCESSOR ARCHITECTURE LatticeMico32 6 stage pipeline
Naive Bayes PROCESSOR ARCHITECTURE LatticeMico32 6 stage pipeline WHERE DOES IT FIT IN THE PIPELINE?
31
DESIGN Naive Bayes EXPECTED MODEL PARAMETERS Counter Size
History Length Figures taken from [1]
32
DESIGN Naive Bayes EXPECTED RESULTS
Performance Measurement: Misprediction Rate (%) Performs better than others Figure taken from [1]
33
POSSIBLE WORK ISSUES Not much research available Feasibility
Logistic Regression ISSUES Not much research available Feasibility POSSIBLE IMPLEMENTATION MODEL Same Attribute Set Performance similar to Perceptron
34
REFERENCES 1) “A 2-Clock-Cycle Naïve Bayes classifier for Dynamic branch prediction in pipelined RISC Microprocessors” , Itaru Hida, Masayuki Ikebe, Tetsuya Asai, and Masato Motomura, Graduate School of Information Science and Technology. 2) “Dynamic Branch Prediction using Machine Learning Algorithms”, Kedar Bellare, Pallika Kanani and Shiraj Sen Department of Computer Science, University of Massachusetts Amherst May 17, 2006. 3) “Branch Prediction with Bayesian Networks”, Jeremy Singer, Gavin Brown, and Ian Watson ,University of Manchester, UK. 4)”Combining Branch Predictors”, Scott McFarling , June 1993.
35
THANK YOU
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.