Download presentation
Presentation is loading. Please wait.
1
Perceptron-based Global Confidence Estimation for Value Prediction Master’s Thesis Michael Black June 26, 2003
2
Thesis Objectives To present a viable global confidence estimator using perceptrons To quantify predictability relationships between instructions To study the performance of the global confidence estimator when used with common value prediction methods
3
Presentation Outline Background: –Data Value Prediction –Confidence Estimation Predictability Relationships Perceptrons Perceptron-based Confidence Estimator Experimental Results and Conclusions
4
Value Locality Suppose instruction 1 has been executed several times before: I 1: 5 (A) = 3 (B) + 2 (C)... I 1: 6 (A) = 4 (B) + 2 (C)... I 1: 7 (A) = 5 (B) + 2 (C) Next time, its outcome A will probably be 8
5
Data Value Prediction A data value predictor predicts A from instruction 1’s past outcomes Instruction 2 speculatively executes using the prediction 1. ADD 7 (A) = 5 (B) + 2 (C) 1. ADD A = 6 (B) + 2 (C) 2. ADD D = (5) E + 8 (A) Predictor: +1
6
Types of Value Predictors Computational: Performs a mathematical operation on past values –Last-Value:5, 5, 5, 55 –Stride:1, 3, 5, 79 Context: Learns repeating sequences of numbers 3, 6, 5, 3, 6, 5, 36
7
Types of Value History Local History: Predicts using data from past instances of instructions Global History: Predicts using data from other instructions Local value prediction is more conventional
8
Are mispredictions a problem? If a prediction is incorrect, speculatively executed instructions must be re-executed This can result in: –Cycle penalties for detecting the misprediction –Cycle penalties for restarting dependent instructions –Incorrect resolution of dependent branch instructions It is better to not predict at all than to mispredict
9
Confidence Estimator Decides whether to make a prediction for an instruction Bases decisions on the accuracy of past predictions Common confidence estimation method: Saturating Up-Down Counter
10
Up-Down Counter Don’t Predict Predict Start Correct Incorrect Correct Incorrect Threshold
11
Local vs. Global Up-Down counter is local –Only past instances of an instruction affect its counter Global confidence estimation uses the prediction accuracy (“predictability”) of past dynamic instructions Problem with global: –Not every past instruction affects the predictability of the current instruction
12
Example I 1.A = B + C I 2.F = G – H I 3.E = A + A Instruction 3 depends on 1 but not on 2 –Instruction 3’s predictability is related to 1 but not 2 If instruction 1 is predicted incorrectly, instruction 3 will also be predicted incorrectly
13
Is global confidence worthwhile? Fewer mispredictions than local –If an instruction mispredicts, its dependent instructions know not to predict Less warm-up time than local –Instructions need not be executed several times before accurate confidence decisions can be made
14
How common are predictability relationships? Simulation study: –How many instructions in a program predict correctly only when a previous instruction predicts correctly? –Which past instructions have the most influence?
15
Predictability Relationships Over 70% of instructions for Stride and Last-Value and over 90% for Context have the same prediction accuracy as a past instruction 90% of the time!
16
Predictability Relationships The most recent 10 instructions have the most influence
17
Global Confidence Estimation A global confidence estimator must: 1.Identify for each instruction which past instructions have similar predictability 2.Use their prediction accuracy to decide whether to predict or not predict
18
Neural Network Used to iteratively learn unknown functions from examples Consists of nodes and links Each link has a numeric weight Data is fed to input nodes and propagated to output nodes by the links Desired output used to adjust (“train”) the weights
19
Perceptron Perceptrons only have input and output nodes They are much easier to implement and train than larger neural networks Can only learn linearly separable functions
20
Perceptron Computation Each bit of input data sourced to an input node Dot product calculated between input data and weights Output is “1” if dot product exceeds a threshold; otherwise “0”
21
Perceptron Training Weights adjusted so that the perceptron output = the desired output for the given input Error value (ε) = desired value – perceptron output ε times each input bit added to each weight
22
Weights Weights determine the effect of each input on the output Positive weight: Output varies directly with input bit Negative weight: Output varies inversely with input bit Large weight: Input has strong effect on output Zero weight Input bit has no effect on output
23
Linear Separability An input may have a direct influence on the output An input may instead have an inverse influence on the output But an input cannot have a direct influence sometimes and an inverse influence at other times
24
Perceptron Confidence Estimator Each input node is a past instruction’s prediction outcome: (1 = correct, –1 = incorrect) The output is the decision to predict: (1 = predict, 0 = don’t predict) Weights determine past instruction’s predictability influence on the current instruction: –Positive weight: current instruction mispredicts when past instruction mispredicts –Negative weight: current instruction mispredicts when past instruction predicts correctly –Zero weight: past instruction does not affect current
25
Perceptron Confidence Estimator Example weights: bias weight = –1 I 1:A = B Cweight = 1 I 2:D = E + Fweight = 1 I 3:P = Q Rweight = 0 I 4:G = A + D (current instruction) Instruction 4 predicts correctly only when 1 and 2 predict correctly
26
Confidence Estimator Organization
27
Perceptron Implementation
28
Weight Value Distribution Simulation Study: –What are typical perceptron weight values? –How does the type of predictor influence the weight distribution? –What minimum range do the weights need to have?
29
Weight Value Distribution
30
Simulation Methodology Measurements simulated using SimpleScalar 2.0a SPEC2000 benchmarks: bzip2, gcc, gzip, perlbmk, twolf, vortex Each benchmark is run for 500 million instructions Value predictors: Stride, Last-Value, Context Baseline confidence estimator: 2-bit up-down counter
31
Simulation Metrics P CORRECT : # of correct predictions P INCORRECT : # of incorrect predictions N: # of cases where no prediction was made
32
Stride Results Perceptron estimator shows a coverage increase of 8.2% and an accuracy increase of 2.7% over the up-down counter
33
Last-Value Results Perceptron estimator shows a coverage increase of 10.2% and an accuracy increase of 5.9% over the up-down counter
34
Context Results Perceptron estimator shows a coverage increase of 6.1% and an accuracy decrease of 2.9% over the up-down counter
35
Sensitivity to GPH size
36
Coverage Sensitivity to the Unavailability of Past Instructions
37
Accuracy Sensitivity to the Unavailability of Past Instructions
38
Coverage Sensitivity to Weight Range Limitations
39
Accuracy Sensitivity to Weight Range Limitations
40
Conclusions Mispredictions are a problem in data value prediction Benchmark programs exhibit strong predictability relationships between instructions Perceptrons enable confidence estimators to exploit these predictability relationships Perceptron-based confidence estimation tends to show significant improvement over up-down counter confidence estimation
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.