Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT.

Lecture 4: Branch Predictors

Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT target –T target doesn’t usually change or has “nice” pattern like subroutine returns Lecture 4: Correlated Branch Predictors 2

If a branch was previously taken, there’s a good chance it’ll be taken again in the future for(i=0; i < 100000; i++) { /* do stuff */ } Lecture 4: Correlated Branch Predictors 3 This branch will be taken 99,999 times in a row. This branch will be taken 99,999 times in a row.

Always predict NT –no fetch bubbles (always just fetch the next line) –does horribly on previous for-loop example Always predict T –does pretty well on previous example –but what if you have other control besides loops? p = calloc(num,sizeof(*p)); if(p == NULL) error_handler( ); Lecture 4: Correlated Branch Predictors 4 This branch is practically never taken This branch is practically never taken

Do what you did last time Lecture 4: Correlated Branch Predictors 5 0xDC08:for(i=0; i < 100000; i++) { 0xDC44:if( ( i % 100) == 0 ) tick( ); 0xDC50:if( (i & 1) == 1) odd( ); } T N

Lecture 4: Correlated Branch Predictors 6 DC08:TTTTTTTTTTT... TTTTTTTTTTNTTTTTTTTT… 100,000 iterations How often is branch outcome != previous outcome? 2 / 100,000 TN NT DC44:TTTTT... TNTTTTT... TNTTTTT... 2 / 100 DC50:TNTNTNTNTNTNTNTNTNTNTNTNTNTNT… 2 / 2 99.998% Prediction Rate 99.998% Prediction Rate 98.0% 0.0%

Lecture 4: Correlated Branch Predictors 7 01 FSM for Last-Outcome Prediction 01 23 FSM for 2bC (2-bit Counter) Predict NT Predict T Transistion on T outcome Transistion on NT outcome

Lecture 4: Correlated Branch Predictors 8 2 T 3 T 3 T … 3 N  N 1  T 0  0 T 1 TTTT … T 1111  T 1 T … 1 0 T 1 T 2 T 3 T 3 T … 3 T   Initial Training/Warm-up 1bC: 2bC: Only 1 Mispredict per N branches now! DC08: 99.999%DC04: 99.0% Only 1 Mispredict per N branches now! DC08: 99.999%DC04: 99.0%

98%  99% –Whoop-Dee-Do! –Actually, it’s 2% misprediction rate  1% –That’s a halving of the number of mispredictions So what? –If misp rate equals 50%, and 1 in 5 insts is a branch, then number of useful instructions that we can fetch is: 5*(1 + ½ + (½) 2 + (½) 3 + … ) = 10 –If we halve the miss rate down to 25%: 5*(1 + ¾ + (¾) 2 + (¾) 3 + … ) = 20 –Halving the miss rate doubles the number of useful instructions that we can try to extract ILP from Lecture 4: Correlated Branch Predictors 9

10 PC hash 32 or 64 bits log 2 n bits n entries/counters Prediction FSM Update Logic FSM Update Logic table update Actual outcome … back to predictors

Just take the log 2 n least significant bits of the PC May need to ignore a few bits –In a 32-bit RISC ISA, all instructions are 4 bytes wide, and all instruction addresses are 4-byte aligned  least two significant bits of PC are always zeros and so they are not included equivalent to right-shifting PC by two positions before hashing –In a variable-length CISC ISA (ex. x86), instructions may start on arbitrary byte boundaries probably don’t want to shift Lecture 4: Correlated Branch Predictors 11

1bc and 2bc don’t do too well (50% at best) But it’s still obviously predictable Why? –It has a repeating pattern:(NT)* –How about other patterns?(TTNTN)* Use branch correlation –The outcome of a branch is often related to previous outcome(s) Lecture 4: Correlated Branch Predictors 12

Lecture 4: Correlated Branch Predictors 13 PC Previous Outcome 1 Counter if prev=0 30 Counter if prev=1 133 prev = 1 30 prediction = N prev = 0 30 prediction = T prev = 1 30 prediction = N prev = 0 30 prediction = T prev = 1 3 prediction = T 3 prev = 1 3 prediction = T 3 prev = 1 3 prediction = T 2  prev = 0 3 prediction = T 2

What pattern has this branch predictor entry learned? Lecture 4: Correlated Branch Predictors 14 PC 03101310022 Last 3 Outcomes Counter if prev=000 Counter if prev=001 Counter if prev=010 Counter if prev=111 001  1; 011  0; 110  0; 100  1 00110011001… (0011)*

Lecture 4: Correlated Branch Predictors 15 PC Hash Different pattern for each branch PC PC Hash Shared set of patterns PC Hash Mix of both

1024 counters (2 10 ) –32 sets ( ) 5-bit PC hash chooses a set –Each set has 32 counters 32 x 32 = 1024 History length of 5 (log 2 32 = 5) Branch collisions –1000’s of branches collapsed into only 32 sets Lecture 4: Correlated Branch Predictors 16 PC Hash 5 5

1024 counters (2 10 ) –128 sets ( ) 7-bit PC hash chooses a set –Each set has 8 counters 128 x 8 = 1024 History length of 3 (log 2 8 = 3) Limited Patterns/Correlation –Can now only handle history length of three Lecture 4: Correlated Branch Predictors 17 PC Hash 7 3

Branch History Table (BHT) –2 a entries –h-bit history per entry Pattern History Table (PHT) –2 b sets –2 h counters per set Total Size in bits –h  2 a + 2 (b+h)  2 Lecture 4: Correlated Branch Predictors 18 PC Hash a b h Each entry is a 2-bit counter

h = 0 or a = 0 (Degenerate Case) –Regular table of 2bC’s (b = log 2 counters) h > 0, a > 1 –“Local History” 2-level predictor h > 0, a = 1 –“Global History” 2-level predictor Lecture 4: Correlated Branch Predictors 19

Local Behavior –What is the predicted direction of Branch A given the outcomes of previous instances of Branch A? Global Behavior –What is the predicted direction of Branch Z given the outcomes of all* previous branches A, B, …, X and Y? * number of previous branches tracked limited by the history length Lecture 4: Correlated Branch Predictors 20

Example: related branch conditions p = findNode(foo); if ( p is parent ) do something; do other stuff; /* may contain more branches */ if ( p is a child ) do something else; Lecture 4: Correlated Branch Predictors 21 Outcome of second branch is always opposite of the first branch A: B:

Testing same/similar conditions –code might test for NULL before a function call, and the function might test for NULL again –in some cases it may be faster to recompute a condition rather than save a previous computation in memory and re- load it –partial correlations: one branch could test for cond 1, and another branch could test for cond 1 && cond 2 (if cond 1 is false, then the second branch can be predicted as false) –multiple correlations: one branch tests cond 1, a second tests cond 2, and a third tests cond 1  cond 2 (which can always be predicted if the first two branches are known). Lecture 4: Correlated Branch Predictors 22

Lecture 4: Correlated Branch Predictors 23 PC Hash b h Single global branch history register (BHR) PC Hash b h b+h

For fixed number of counters –Larger h  Smaller b Larger h  longer history –able to capture more patterns –longer warm-up/training time Smaller b  more branches map to same set of counters –more interference –Larger b  Smaller h just the opposite… Lecture 4: Correlated Branch Predictors 24

Not all 2 h “states” are used –(TTNN)* only uses half of the states for a history length of 3, and only ¼ of the states for a history length of 4 –(TN)* only uses two states no matter how long the history length is Not all bits of the PC are uniformly distributed Not all bits of the history are uniformly likely to be correlated –more recent history more likely to be strongly correlated Lecture 4: Correlated Branch Predictors 25

S. McFarling (DEC-WRL TR, 1993) Lecture 4: Correlated Branch Predictors 26 PC Hash k k XOR k = log 2 counters

Branch Address Global History Gselect 4/4 Gshare 8/8 0000000000000001 00000000 0000 11111111000000001111000011111111 100000001111000001111111 Lecture 4: Correlated Branch Predictors 27 Insufficient history leads to a conflict Insufficient history leads to a conflict

Branch A: always not-taken Branch B: always taken Branch C: TNTNTN… Branch D: TTNNTTNN… Lecture 4: Correlated Branch Predictors 28 3 0 3 0 3 0 0 3 000 111 010 101 001 011 100 110

Branch X: TTTNTTTN… Branch Y: TNTNTN… Branch Z: TTTT… Lecture 4: Correlated Branch Predictors 29 000 111 010 101 001 011 100 110 0 3 3 3 3? ?

There are patterns and asymmetries in branches Not all patterns occur with same frequency Branches have biases This lecture: –Bi-Mode (Lee et al., MICRO 97) –gskewed (Michaud et al., ISCA 97) These are global history predictors, but the ideas can be applied to other types of predictors Lecture 4: Correlated Branch Predictors 30

Interference occurs because two (or more) branches hash to the same index A different hash function can prevent this collision –but may cause other collisions Use multiple hash functions such that a collision can only occur in a few cases –use a majority vote to make final decision Lecture 4: Correlated Branch Predictors 31

Lecture 4: Correlated Branch Predictors 32 PC Global Hist hash 1 hash 2 hash 3 maj prediction PHT 1 PHT 2 PHT 3 if hash 1 (x) = hash 1 (y) then: hash 2 (x)  hash 2 (y) hash 3 (x)  hash 3 (y)

Lecture 4: Correlated Branch Predictors 33 A B maj

Some branches exhibit local history correlations –ex. loop branches While others exhibit global history correlations –“spaghetti logic”, ex. if-elsif-elsif-elsif-else branches Using a global history predictor prevents accurate prediction of branches exhibiting local history correlations And visa-versa Lecture 4: Correlated Branch Predictors 34

Pred 0 Pred 1 Meta Update  ---  Inc  Dec --- Lecture 4: Correlated Branch Predictors 35 Pred 0 Pred 1 Meta- Predictor Meta- Predictor Final Prediction table of 2-/3-bit counters If meta-counter MSB = 0, use pred 0 else use pred 1

Global history + Local history “easy” branches + global history –2bC and gshare short history + long history Many types of behavior, many combinations Lecture 4: Correlated Branch Predictors 36

Why only combine two predictors? Lecture 4: Correlated Branch Predictors 37 M 23 MM prediction M M Tradeoff between making good individual predictions (P’s) vs. making good meta-predictions (M’s) –for a fixed hardware budget, improving one may hurt the other P3P3 P3P3 P2P2 P2P2 M 01 P1P1 P1P1 P0P0 P0P0 P3P3 P3P3 P2P2 P2P2 P1P1 P1P1 P0P0 P0P0

Selection discards information from n-1 predictors Fusion attempts to synthesize all information –more info to work with –possibly more junk to sort through Lecture 4: Correlated Branch Predictors 38 M M prediction P3P3 P3P3 M M P2P2 P2P2 P1P1 P1P1 P0P0 P0P0 P3P3 P3P3 P2P2 P2P2 P1P1 P1P1 P0P0 P0P0

Long global history provides more context for branch prediction/pattern matching –more potential sources of correlation Costs –For PHT-based approach, HW cost increases exponentially: O(2 h ) counters –Training time increases, which may decrease overall accuracy Lecture 4: Correlated Branch Predictors 39

Ex: prediction equals opposite for 2 nd most recent Lecture 4: Correlated Branch Predictors 40 Hist Len = 2 4 states to train: NN  T NT  T TN  N TT  N Hist Len = 3 8 states to train: NNN  T NNT  T NTN  N NTT  N TNN  T …

Uses “Perceptron” from classical machine learning theory –simplest form of a neural-net (single-layer, single-node) Inputs are past branch outcomes Compute weighted sum of inputs –output is linear function of inputs –sign of output is used for the final prediction Lecture 4: Correlated Branch Predictors 41

Lecture 4: Correlated Branch Predictors 42 xnxn x0x0 x1x1 x2x2 x n-1 xxxxxxxxxxxxxxxxxxx Adder  0 prediction “bias”

Magnitude of weight w i determines how correlated branch i is to the current branch Sign of weight determines postitive or negative correlation Ex. outcome is usually opposite as 5 th oldest branch –w 5 has large magnitude (L), but is negative –if x 5 is taken, then w 5  x 5 = -L  1 = -L tends to make sum more negative (toward a NT prediction) –if x 5 is not taken, then w 5  x 5 = -L  -1 = L Lecture 4: Correlated Branch Predictors 43

When actual branch outcome is known: –if x i = outcome, then increment w i (positive correlation) –if x i  outcome, then decrement w i (negative correlation) –for x 0, increment if branch taken, decrement if NT “Done with training” –if |  w i | > , then don’t update weights unless mispred Lecture 4: Correlated Branch Predictors 44

If no correlation exists with branch i, then w i will just get incremented and decremented back and forth, w i  0 If correlation exists with branch j, then w j will be consistently incremented (or decremented) to have a large influence on the overall sum Lecture 4: Correlated Branch Predictors 45

Perceptron computes linear combination of inputs Can only learn linearly separable functions Lecture 4: Correlated Branch Predictors 46 xixi xjxj 1 1 N N N T xixi xjxj 1 1 N T T N f() = -3*x i -4*x j – 5 wiwi wjwj w0w0 No values of w i, w j, w 0 exist to satisfy these output No straight line exists that separates T’s from N’s

Lecture 4: Correlated Branch Predictors 47 PC Hash one set of weights BHR … adder … prediction = sign(sum) Size = (h+1)*k*n + h + Area(mult) + Area(adder) h = history length, k = counter width, n = number of perceptrons in table Table of weights TableBHR Multipliers

GEometric History Length predictor Lecture 4: Correlated Branch Predictors 48 very long branch history h1h1h1h1 h1h1h1h1 h2h2h2h2 h2h2h2h2 h3h3h3h3 h3h3h3h3 h4h4h4h4 h4h4h4h4 PC adder prediction = sign(sum) K-bit weights L1L1 L2L2 L3L3 L4L4 L(i) = a i-1 L(1) History lengths form a geometric progression

PPM = Partial Pattern Matching –Used in data compression –Idea: Use longest history necessary, but no longer Lecture 4: Correlated Branch Predictors 49 Most RecentOldest 2bc Partial tags h1h1h1h1 h1h1h1h1 h2h2h2h2 h2h2h2h2 h3h3h3h3 h3h3h3h3 h4h4h4h4 h4h4h4h4 PC ======== 0 1 Pred 2bc

Similar to PPM, but uses geometric history lengths –Currently the most accurate type of branch prediction algorithm References ( www.jilp.org ): –PPM: Michaud (CBP-1) –O-GEHL: Seznec (CBP-1) –TAGE: Seznec & Michaud (JILP) –L-TAGE: Seznec (CBP-2) Lecture 4: Correlated Branch Predictors 50

Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT.

Similar presentations

Presentation on theme: "Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT.

Similar presentations

Presentation on theme: "Lecture 4: Branch Predictors. Direction: 0 or 1 Target: 32- or 64-bit value Turns out targets are generally easier to predict –Don’t need to predict NT."— Presentation transcript:

Similar presentations

About project

Feedback