Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 7810 Lecture 13 Pipeline Gating: Speculation Control For Energy Reduction S. Manne, A. Klauser, D. Grunwald Proceedings of ISCA-25 June 1998.

Similar presentations


Presentation on theme: "CS 7810 Lecture 13 Pipeline Gating: Speculation Control For Energy Reduction S. Manne, A. Klauser, D. Grunwald Proceedings of ISCA-25 June 1998."— Presentation transcript:

1 CS 7810 Lecture 13 Pipeline Gating: Speculation Control For Energy Reduction S. Manne, A. Klauser, D. Grunwald Proceedings of ISCA-25 June 1998

2 Cost of Speculation 9.912.223.910.46.94.611.31.7Mispredict rates 

3 Pipeline Gating Low confidence branches throttle instr fetch until they are resolved Pipeline gating usually lasts for fewer than five cycles

4 Metrics SPEC (specificity): fraction of all mispredicted branches detected as low-confidence by the confidence estimator (coverage) PVN (predictive value of a negative test): probability of a low-confidence branch being incorrectly branch-predicted (accuracy)

5 Confidence Estimators Perfect: to gauge potential benefits Static: branches that have low prediction rates JRS: if a branch has yielded N successive correct predictions, it has high confidence Saturating counters: unbiased counter value or disagreement in two predictors  low confidence Distance: mpreds are clustered, hence the first 4 branches after a mispredict have low confidence

6 SPEC and PVN It is easier to achieve a high SPEC value than PVN A high PVN value can be achieved by using N low-confidence branches to invoke gating – if PVN is 30%, re-defining low-confidence as two low-confidence branches increases PVN to 51% SPEC (coverage): mispred branches detected by low-confidence estimator PVN (accuracy): % of low-confidence branches that are branch mpreds

7 Perfect

8 Gating Results

9 Results Can gating improve performance? – only if cache pollution is significant Less than 1% performance loss and up to 38% reduction in extra work Energy consumption could go up – some work is independent of number of executed instrs (clock distribution) – incr. execution time can incr. Energy Pipeline gating should reduce power consumption

10 Results

11 CS 7810 Lecture 13 Cache Decay: Exploiting Generational Behavior to Reduce Cache Leakage Power S. Kaxiras, Z. Hu, M. Martonosi Proceedings of ISCA-28 July 2001

12 Leakage Power Trends Circuit delay  1/(V – V th ) Leakage  num transistors (incr) supply voltage (decr) (exp) low thresh. voltage (incr) L1 and L2 caches are the biggest contributors (high transistor budgets)

13 V dd -Gating Leakage can be reduced by gating off the supply voltage to the circuit When applied to a cache, the contents of the SRAM cell are lost Cache decay: apply Vdd-gating when you do not care about cache contents

14 Lifetime of a Cache Line

15 Overheads Hardware to determine when to decay Introduces additional cache misses Normalized cache leakage power = Activeratio (fraction of cache that is powered on) + (Counter overhead : Leak) x activity + (L2 access energy : Leak) x num-misses Increased execution time (< 0.7%) L2 access/leakage ratio is ~9

16 Skier’s Dilemma New skis: $400 Ski rentals: $20 Heuristic: Buy skis after rental cost = purchase price Ski trips:5 10 15 20 25 50 Optimal: $100 $200 $300 $400 $400 $400 Heuristic: $100 $200 $300 $800 $800 $800 Likewise, decay a cache line when the cost of an additional miss equals leakage dissipated so far

17 Tracking Dead Time Each line has a 2-bit counter that gets reset on every access and gets incremented every 2500 cycles through a global signal (negligible overhead) After 10,000 clock cycles, the counter reaches the max value and triggers a decay Adaptive decay: Start with a short decay period; if you have a quick miss, double the period; if there is no miss, halve the period

18 Results

19 Overheads

20 Other Results L2 cache is equally suitable to decay techniques -- lifetimes are scaled by a factor of 10, an extra miss also costs a lot more For their experiments, there is little interference from multiprogramming Some instructions can easily be identified as last touches to a cache block – potential for early cache decay Can this apply to bpred, register file?

21 Title Bullet


Download ppt "CS 7810 Lecture 13 Pipeline Gating: Speculation Control For Energy Reduction S. Manne, A. Klauser, D. Grunwald Proceedings of ISCA-25 June 1998."

Similar presentations


Ads by Google