Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fine-Grain CAM-Tag Cache Resizing Using Miss Tags

Similar presentations


Presentation on theme: "Fine-Grain CAM-Tag Cache Resizing Using Miss Tags"— Presentation transcript:

1 Fine-Grain CAM-Tag Cache Resizing Using Miss Tags
Michael Zhang Krste Asanovic ISLPED ’02, August 12-14, Monterey, CA

2 Motivation Cache uses 30-60% processor energy in embedded systems
43% for StrongArm-1 [vs. 16% for Alpha 21264] Working set of many applications smaller than cache size Can reduce cache size adaptively to match the current working set Deactivate unused portions of the cache circuitry to save power Active power Leakage power

3 Related Work Off-Line Techniques – Selective Ways
Statically deactivate cache ways according to profiling information before application execution. [Albonesi ’99] On-Line Techniques – DRI-Cache Dynamically keeps the instruction cache miss rate under a preset bound. [Powell et. al. ’01] Line Deactivation – Cache Decay Per cache line counter used to track and turn off not recently used cache lines to reduce leakage. [Kaxiras et. al.’01] Line Deactivation – Adaptive Mode Control Only deactivate lines, not tags. [Zhou et. al. ’01] A hit in the tag of a deactivated line is a sleep miss A miss in the tag is a real miss The ratio of sleep miss to real miss used to adjust resizing intervals. Limit Study Various design choices for RAM-tag caches are studied and compared in [Yang et. al. ’02.]

4 Miss Tags – The Idea Data Array Tag Miss Tag Array
Non- Resizable Resizable Data Array Tag Miss Tag Array Miss Tags : A second set of tags used as predictors Fixed-Size, acts as the tag array of the full-sized cache Checked only during cache miss to see if full sized cache would have avoided the miss Live in the non-critical miss path, can be implemented with smaller, slower, non-leaky transistors

5 Miss Tag – The Idea Data Array Tag Miss Tag Array
Non- Resizable Resizable Data Array Tag Miss Tag Array hit Downsize cache If many hits in regular tags, few accesses to miss tags Application has small working set Smaller cache probably okay Hint to downsize the cache

6 Miss Tag – The Idea Data Array Tag Miss
Resizable Miss Non- miss Downsize cache If miss in regular tags and miss in miss tags Larger cache unlikely to help Example - streaming applications with no temporal locality Hint to downsize the cache

7 Miss Tag – The Idea Data Array Tag Miss Tag Array
Non- Resizable Resizable Data Array Tag Miss Tag Array miss Upsize cache hit If miss in regular tags but hit in miss tags Larger cache likely to help Upsize the cache

8 Resizing Illustration
Cache Size Execution time Initial Period Resizing Interval Resizing Point 1 Downsizing # of Miss Tag Hits < Lower Bound Resizing Point 2 No Action Upper Bound > # of Miss Tag Hits Lower Bound Resizing Point 3 Upsizing # of Miss Tag Hits > Upper Bound Measurement # of Miss Tag Hits Parameters Resizing Interval Upper Bound Lower Bound Algorithm: Adjust cache size such that # of miss tag hits is within upper and lower bounds per resizing interval

9 CAM-Tag Cache Popular among low-power processors Generally Sub-banked
Hit? Data Array Tag Popular among low-power processors ARM3 [’89] StrongArm [’98] Xscale [’01] Generally Sub-banked One sub-bank activated Each bank is a set Each line within a sub-bank is a way All tags searched in parallel Matched tag asserts appropriate word line for data read/write. Tag Bank Offset

10 MTR with CAM-Tag Cache Tag Array Data Array Miss Tag Array
Non- resizable resizable Sub-bank Structure Tag Array Data Array Miss Tag Array MTR cache sub-bank configuration Each sub-bank has 8 equal partitions For upsize, turn on entire partition For downsize, turn off last active line Conservative downsizing Sub-bank resizing Each sub-bank resized individually Resizing spaced out evenly in time No burst of dirty write backs

11 Hardware Modifications
The Miss Tags Only accessed during miss – not on critical path Can be implemented using slow non-leaky transistors, using alternative serial/parallel RAM/CAM structures Turning off cache lines – Leakage Reduction Gated-Vdd: Adding a stacked N-type transistor to reduce leakage energy. [Powell et. al. ’01] Leakage-Biased-Bitlines – Leakage Reduction Turning off the precharge of CAM/RAM bitlines and CAM match lines. Automatically biases the voltage to minimize leakage. [Heo et. al. ’02] Hierarchical Bitlines – Active Energy Reduction Used to turn off portions of the cache block to reduce active energy. [Ghose & Kamble ’99] Minimal cycle time impact < 1.5% (from Gated-Vdd) No cycle time penalty without Gated-Vdd Small area Impact at ~ 10% depending on implementation

12 Hardware Modification Cont’d
Leakage-Biased Bitlines Gated-Vdd Hierarchical Bitlines

13 Experimental Setup Modified SimpleScalar 3.0 simulator
Single-issue in-order processor Baseline cache similar to Intel XScale 32KB implemented in 32 1KB sub-banks 32-way set-associative with 32-Byte cache lines FIFO replacement policy per sub-bank Benchmarks: SpecINT2000 and SpecFP2000 1.5 Billion cycles of reference inputs Baseline resizing scheme implemented for performance comparison Compares to a fixed miss rate If current miss rate > fixed threshold, upsize, otherwise, downsize Similar to DRI-Cache

14 Dynamic Resizing Illustration
D-cache Miss Rate vs. Time D-cache Active Size vs. Time Miss rate and average active cache size obtained by applying MTR to our D-cache

15 Small Working Set Example
Working set is very small small cache sufficient!

16 Low Temporal Locality Examples
High miss rate, no temporal locality Small cache size has similar performance as large cache size

17 Adaptive Examples

18 CPI Comparison Each MTR data point, (active cache size, CPI) is obtained by varying resizing interval, upper and lower bound Each Baseline data point, (active cache size, CPI) is obtained by varying the preset miss rate threshold For optimal results in MTR cache, parameters picked to yield average active I-cache size of 12KB and D-Cache size of 8KB

19 Energy Savings Sensitivity Analysis with two factors
L2 = 16xL1 Leakage = 50% of total L2 = 16xL1 Leakage = 50% of total L2 = 128xL1 Leakage = 0% of total L2 = 128xL1 Leakage = 0% of total Sensitivity Analysis with two factors Percentage of leakage energy to total energy (0% to 50%) L2 refill energy in multiples of L1 access energy (16x to 128x) Energy figures from simulation of extracted layout TSMC 0.25 mm technology Writeback energy included

20 Performance Performance is affected by
ratio of upper bound and lower bound to resizing interval [5, 10] / 32k ~= [10, 20] / 64k length of resizing interval Large resizing interval yields less writebacks If resizing interval is too large, lose resizing ability If resizing interval is too small, thrashing Performance consistent across all benchmarks Duplicated tags acts as predictor for each benchmark Easy parameter tuning

21 MTR – A Summary CAM-Tag cache offers very fine-grained resizing
One cache line at a time Avoids writeback bursts Miss Tag Resizing Algorithm Resizes dynamically No delay overhead – resizing operations completely fall into the miss path Resizing determined by the difference between actual miss rate and the predicted miss rate of the system Resizing parameters can be tuned to work well for all benchmarks - no need for application-specific parameter tuning. Reduces both active and leakage energy

22 Conclusion Proposed MTR, a dynamic cache resizing technique for CAM-Tag caches. Uses a fixed-sized duplicate tag array to keep track of miss rate of full-sized cache. Negligible delay overhead (accessed only on miss) Negligible energy overhead (non-leaky slow transistors used) Achieves 28% to 56% energy saving for D-Cache depending on operating point Achieves 34% to 49% energy saving for I-Cache depending on operating point Minimal cycle time impact at < 1.5% Small area impact at ~ 10%


Download ppt "Fine-Grain CAM-Tag Cache Resizing Using Miss Tags"

Similar presentations


Ads by Google