Hot-and-Cold: Using Criticality in the Design of Energy-Efficient Caches Rajeev Balasubramonian, University of Utah Viji Srinivasan, IBM T.J. Watson Sandhya Dwarkadas, University of Rochester Alper Buyuktosunoglu, IBM T.J. Watson
All Instructions are not Created Equal Critical instructions – lie on the program critical path Non-critical instructions – can be slowed without increasing execution time Potential to improve cache performance (?) [Srinivasan ’01] [Fisk ’99] Prioritization policies [Fields ’01] [Tune ’01] Energy-efficient ALUs [Seng ’01]
Energy-Delay Trade-Offs Example energy-delay trade-off techniques: Voltage scaling, transistor sizing, way prediction, serial-access Gated-ground cells, high V t VtVt Normalized Leakage Normalized Delay Low Nominal11 High Transistor sizingVariable threshold voltage
Exploiting Criticality Design two static banks – hot bank: fast and high power cold bank: slow and low power Instructions have to be classified as critical or not and Data has to be placed in one of two banks Energy-efficient ALUs are easier to handle as there is no associated storage
Criticality Metric Oldest-N: The N oldest instructions in the queue are critical Younger instructions are likely to be on mispredicted paths or can tolerate latencies N can be varied based on program needs Minimal hardware overhead Behavior comparable to more complex metrics
Instruction Classification
Data Classification Exclusively non-critical Exclusively critical
Hot-and-Cold Microarchitecture Dispatch Issue Queue Bank Predictor Placement Predictor L2 Cold bank Hot bank Criticality Counters
Performance Results
Energy Results
Results Summary Bank mispredict rate of 9.5% Criticality mismatch rate of 26% Performance loss = 2.7% (data reorganization) + (0.8 x slowdown) L1 cache energy savings of 37%
Related Work Recent split-cache organization by Abella and Gonzalez [ICCD’03] Base Fast Slow Data allocation based on criticality of accessing instruction
Conclusions Data and instruction classification is reasonably accurate Overhead from contention is non-trivial Results are worthwhile in limited settings The use of criticality for data cache reorganization yields little benefit