Presentation on theme: "H-Pattern: A Hybrid Pattern Based Dynamic Branch Predictor with Performance Based Adaptation Samir Otiv Second Year Undergraduate Kaushik Garikipati Second."— Presentation transcript:
H-Pattern: A Hybrid Pattern Based Dynamic Branch Predictor with Performance Based Adaptation Samir Otiv Second Year Undergraduate Kaushik Garikipati Second Year Undergraduate Milan Patnaik MTech Dr. V Kamakoti Professor Indian Institute of Technology Madras Department of Computer Science & Engineering
Approach Conditional branch instructions often follow patterns which periodically repeat. If a branch instruction is found to follow a certain repeating pattern, a predictor must have the ability to accurately predict its outcome for as long as the pattern persists. Predicting ALL patterns with periods of ANY length: Impossible, given a fixed storage budget.
Approach STRATEGY: Restrict ourselves to capturing patterns with a period only up to a certain predetermined length Objective: Creating a predictor that captures patterns with periods of lengths of up to n-bits. Challenges: 1.Using minimum space 2.The patterns followed can change – must dynamically relearn
Solution For every branch: Store local history of 2n bits If a branch instruction follows a pattern of execution with a period p, where p is at most equal to n, then the most recent set of n bits must be identical to the set of n bits that occurred p executions prior. outcome(h i ) = outcome(h i+p ) (where h i = i th most recent execution) To predict, all we do is compare the most recent n bits to successively older History Patterns (substrings of n bits of the local history), and stop at the first match. The bit, just after this matching substring, is our prediction for the next execution. (The picture on the next slide should clarify)
Here, with n=8, we store a local history of 16 bits. The branch instruction follows a repeating pattern –(110)-, which has a period of 3. The bit string h 0 to h 7 (Current Pattern) matches precisely with the bit string h 3 to h 11 (Matched Current Pattern). The prediction returned is the bit just after the matched current pattern – h 2. Illustration
H-Pattern: nBPAT + AltPred nBPAT: n-Bit Pattern Predictor AltPred: Any other alternate branch predictor When no pattern is detected (i.e. no pattern match occurs), AltPred is used. When a pattern is detected, the better performing predictor is used.
The nBPAT Predictor Every entry of the predictor is comprised of: A 2n-bit shift register for local history A saturating counter to keep track of the better performing predictor (as described in ‘Combining Branch Predictors’ by Scott McFarling) Storage: Various configurations possible – tagged/tagless/direct mapped/associative
The nBPAT Algorithm To Predict: 1.Match the current pattern (h 0 to h n-1 ) with successively older history patterns 2.If the first match is found at h i, then h i-1 is the predicted outcome. If the most significant bit of the saturating selection counter is 1, then return h i-1. 3.If there is no match, or if the most significant bit is 0, use AltPred To Update: 1.If AltPred mispredicted and nBPAT correctly predicted, increment the saturating selection counter. 2. If AltPred correctly predicted and nBPAT mispredicted, decrement the saturating selection counter. 3.If nBPAT was not ready, don’t change the saturating counter 4.Update the local history by inserting the outcome of the branch into the local history shift register
Combinations of H-Pattern H-Pattern: Various configuration decisions AltPred Component: Several possible options, for instance: Gshare TAGE ISL-TAGE nBPAT Storage Structure: Tagged/Tagless Associative/Direct Mapped
H-Pattern with Gshare Configuration: Tagless, direct-mapped table used for nBPAT – indexed by few of the least significant bits of the PC 50% of the storage budget assigned to nBPAT Outcome: Distinct improvement in accuracy observed, as will be shown soon.
H-Pattern with TAGE/ISL-TAGE Minimal portion of storage allocated to nBPAT The storage structure must facilitate maximum accuracy by nBPAT for very small storage spaces. Proportion of the storage budget allocated to nBPAT was different for different budgets Improvement in accuracy was lesser than that achieved with Gshare
H-Pattern with TAGE/ISL-TAGE CONFIGURATION: nBPAT STORAGE Partially tagged, 2-way set-associative. Selection Counter: 4-bits Useful Counter: Included in every entry. Serves as a measure of the effectiveness of an entry in the table. Decremented if: 1. No pattern match found 2. Misprediction by nBPAT & correct prediction by AltPred Incremented if misprediction by AltPred and correct prediction by nBPAT. All useful counters are reset periodically using a global reset counter. This correctly captures the notion of an entry in the table being effective or ineffective, and aids in the entry replacement policy.
H-Pattern with TAGE/ISL-TAGE UPDATE ALGORITHM: 1.If the TAGE predictor MISPREDICTED and there is no tag match in nBPAT 2-way associative table, and, either of the 2 potential entry locations have Useful = 0, then, make Tag = [BranchTag] and Useful = [Maximum]. 2.If the entry ALREADY exists in the nBPAT 2-way associative table, then, 1.If nBPAT was not ready, OR, nBPAT mispredicted and TAGE correctly predicted, decrease useful. 2.If nBPAT correctly predicted and TAGE mispredicted, increase useful 3.Update the nBPAT entry as described earlier in the nBPAT algorithm 4.Update the TAGE/ISL-TAGE predictor
Reference TAGE Configurations The optimized configuration for an 8-table TAGE predictor, as specified in the paper “A case for (partially) Tagged Geometric history length branch prediction”, by André Seznec and Pierre Michaud, was used. 4KB: History Lengths = 5 to 127 32KB: History Lengths = 5 to 450 Whereas for the unlimited case, 18 tagged tables were used. History Lengths = 3 to 2000
H-Pattern with TAGE Configurations 4KB: Tag length was reduced by 1 in every alternate table starting from T2. 4-BPAT predictor used with 7-bit tagged entries & 3-bit useful counters. 32KB: Table T6 of TAGE has been halved in size. 8-BPAT predictor used with 8-bit tagged entries & 4-bit useful counter. Unlimited: 8-BPAT predictor used with 16-bit tagged width.
H-Pattern with TAGE Mispredictions per Kilo Instructions – CBP 2014 Framework
Reference ISL-TAGE Configurations 4KB: Configuration was same as the 8-component predictor specified in the paper “A case for (partially) Tagged Geometric history length branch prediction”, by André Seznec and Pierre Michaud, with space freed from the base bimodal predictor by having only 2K prediction entries and 1K hysteresis entries to accommodate statistical corrector and loop predictor. History lengths = 5 to 126. 32KB: Configuration (including history lengths) was identical to the one specified in the paper “A 64KBit ISL-TAGE branch predictor ”, by André Seznec, with all storage tables halved. Unlimited: 18 tagged tables were used. History Lengths = 3 to 2000
H-Pattern with ISL-TAGE Configurations 4KB From the reference 4KB ISL-TAGE, freed one tag bit from every alternate table starting from T2. 4-BPAT predictor used with 7-bit tagged entries & 3-bit useful counters. 32KB From the reference 32KB ISL-TAGE, halved the last shared table and reduced the size of statistical corrector and loop predictor. 4-BPAT predictor used with 6-bit tagged entries & 3-bit useful counters. Unlimited In combination with the reference Unlimited ISL-TAGE predictor, an 8-BPAT predictor was used with 16-bit tagged entries & 4-bit useful counters.
H-Pattern with ISL-TAGE Mispredictions per Kilo Instructions – CBP 2014 Framework