Presentation is loading. Please wait.

Presentation is loading. Please wait.

Perceptron-based Global Confidence Estimation for Value Prediction Master’s Thesis Michael Black June 26, 2003.

Similar presentations


Presentation on theme: "Perceptron-based Global Confidence Estimation for Value Prediction Master’s Thesis Michael Black June 26, 2003."— Presentation transcript:

1 Perceptron-based Global Confidence Estimation for Value Prediction Master’s Thesis Michael Black June 26, 2003

2 Thesis Objectives To present a viable global confidence estimator using perceptrons To quantify predictability relationships between instructions To study the performance of the global confidence estimator when used with common value prediction methods

3 Presentation Outline Background: –Data Value Prediction –Confidence Estimation Predictability Relationships Perceptrons Perceptron-based Confidence Estimator Experimental Results and Conclusions

4 Value Locality Suppose instruction 1 has been executed several times before: I 1: 5 (A) = 3 (B) + 2 (C)... I 1: 6 (A) = 4 (B) + 2 (C)... I 1: 7 (A) = 5 (B) + 2 (C) Next time, its outcome A will probably be 8

5 Data Value Prediction A data value predictor predicts A from instruction 1’s past outcomes Instruction 2 speculatively executes using the prediction 1. ADD 7 (A) = 5 (B) + 2 (C) 1. ADD A = 6 (B) + 2 (C) 2. ADD D = (5) E + 8 (A) Predictor: +1

6 Types of Value Predictors Computational: Performs a mathematical operation on past values –Last-Value:5, 5, 5, 55 –Stride:1, 3, 5, 79 Context: Learns repeating sequences of numbers 3, 6, 5, 3, 6, 5, 36

7 Types of Value History Local History: Predicts using data from past instances of instructions Global History: Predicts using data from other instructions Local value prediction is more conventional

8 Are mispredictions a problem? If a prediction is incorrect, speculatively executed instructions must be re-executed This can result in: –Cycle penalties for detecting the misprediction –Cycle penalties for restarting dependent instructions –Incorrect resolution of dependent branch instructions It is better to not predict at all than to mispredict

9 Confidence Estimator Decides whether to make a prediction for an instruction Bases decisions on the accuracy of past predictions Common confidence estimation method: Saturating Up-Down Counter

10 Up-Down Counter Don’t Predict Predict Start Correct Incorrect Correct Incorrect Threshold

11 Local vs. Global Up-Down counter is local –Only past instances of an instruction affect its counter Global confidence estimation uses the prediction accuracy (“predictability”) of past dynamic instructions Problem with global: –Not every past instruction affects the predictability of the current instruction

12 Example I 1.A = B + C I 2.F = G – H I 3.E = A + A Instruction 3 depends on 1 but not on 2 –Instruction 3’s predictability is related to 1 but not 2 If instruction 1 is predicted incorrectly, instruction 3 will also be predicted incorrectly

13 Is global confidence worthwhile? Fewer mispredictions than local –If an instruction mispredicts, its dependent instructions know not to predict Less warm-up time than local –Instructions need not be executed several times before accurate confidence decisions can be made

14 How common are predictability relationships? Simulation study: –How many instructions in a program predict correctly only when a previous instruction predicts correctly? –Which past instructions have the most influence?

15 Predictability Relationships Over 70% of instructions for Stride and Last-Value and over 90% for Context have the same prediction accuracy as a past instruction 90% of the time!

16 Predictability Relationships The most recent 10 instructions have the most influence

17 Global Confidence Estimation A global confidence estimator must: 1.Identify for each instruction which past instructions have similar predictability 2.Use their prediction accuracy to decide whether to predict or not predict

18 Neural Network Used to iteratively learn unknown functions from examples Consists of nodes and links Each link has a numeric weight Data is fed to input nodes and propagated to output nodes by the links Desired output used to adjust (“train”) the weights

19 Perceptron Perceptrons only have input and output nodes They are much easier to implement and train than larger neural networks Can only learn linearly separable functions

20 Perceptron Computation Each bit of input data sourced to an input node Dot product calculated between input data and weights Output is “1” if dot product exceeds a threshold; otherwise “0”

21 Perceptron Training Weights adjusted so that the perceptron output = the desired output for the given input Error value (ε) = desired value – perceptron output ε times each input bit added to each weight

22 Weights Weights determine the effect of each input on the output Positive weight: Output varies directly with input bit Negative weight: Output varies inversely with input bit Large weight: Input has strong effect on output Zero weight Input bit has no effect on output

23 Linear Separability An input may have a direct influence on the output An input may instead have an inverse influence on the output But an input cannot have a direct influence sometimes and an inverse influence at other times

24 Perceptron Confidence Estimator Each input node is a past instruction’s prediction outcome: (1 = correct, –1 = incorrect) The output is the decision to predict: (1 = predict, 0 = don’t predict) Weights determine past instruction’s predictability influence on the current instruction: –Positive weight: current instruction mispredicts when past instruction mispredicts –Negative weight: current instruction mispredicts when past instruction predicts correctly –Zero weight: past instruction does not affect current

25 Perceptron Confidence Estimator Example weights: bias weight = –1 I 1:A = B  Cweight = 1 I 2:D = E + Fweight = 1 I 3:P = Q  Rweight = 0 I 4:G = A + D (current instruction) Instruction 4 predicts correctly only when 1 and 2 predict correctly

26 Confidence Estimator Organization

27 Perceptron Implementation

28 Weight Value Distribution Simulation Study: –What are typical perceptron weight values? –How does the type of predictor influence the weight distribution? –What minimum range do the weights need to have?

29 Weight Value Distribution

30 Simulation Methodology Measurements simulated using SimpleScalar 2.0a SPEC2000 benchmarks: bzip2, gcc, gzip, perlbmk, twolf, vortex Each benchmark is run for 500 million instructions Value predictors: Stride, Last-Value, Context Baseline confidence estimator: 2-bit up-down counter

31 Simulation Metrics P CORRECT : # of correct predictions P INCORRECT : # of incorrect predictions N: # of cases where no prediction was made

32 Stride Results Perceptron estimator shows a coverage increase of 8.2% and an accuracy increase of 2.7% over the up-down counter

33 Last-Value Results Perceptron estimator shows a coverage increase of 10.2% and an accuracy increase of 5.9% over the up-down counter

34 Context Results Perceptron estimator shows a coverage increase of 6.1% and an accuracy decrease of 2.9% over the up-down counter

35 Sensitivity to GPH size

36 Coverage Sensitivity to the Unavailability of Past Instructions

37 Accuracy Sensitivity to the Unavailability of Past Instructions

38 Coverage Sensitivity to Weight Range Limitations

39 Accuracy Sensitivity to Weight Range Limitations

40 Conclusions Mispredictions are a problem in data value prediction Benchmark programs exhibit strong predictability relationships between instructions Perceptrons enable confidence estimators to exploit these predictability relationships Perceptron-based confidence estimation tends to show significant improvement over up-down counter confidence estimation


Download ppt "Perceptron-based Global Confidence Estimation for Value Prediction Master’s Thesis Michael Black June 26, 2003."

Similar presentations


Ads by Google