Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Balanced Cache:Reducing Conflict Misses of Direct-Mapped Caches through Programmable Decoders ISCA 2006,IEEE. By Chuanjun Zhang Speaker: WeiZeng.

Similar presentations


Presentation on theme: "1 Balanced Cache:Reducing Conflict Misses of Direct-Mapped Caches through Programmable Decoders ISCA 2006,IEEE. By Chuanjun Zhang Speaker: WeiZeng."— Presentation transcript:

1 1 Balanced Cache:Reducing Conflict Misses of Direct-Mapped Caches through Programmable Decoders ISCA 2006,IEEE. By Chuanjun Zhang Speaker: WeiZeng

2 2 Outline r Introduction r The B-Cache Organization r Experimental Results and Analysis r Related Work r Conclusion

3 3 Outline r Introduction r The B-Cache Organization r Experimental Results and Analysis r Related Work r Conclusion

4 4 Background  Bottleneck to achieve high performance  Increasing gap between memory latency and processor speed  Multilevel memory hierarchy  Cache acts as intermediary between the super fast processor and the much slower main memory.  Two cache mapping schemes  Direct-Mapped Cache:  Set-Associative Cache:

5 5 Comparision Direct-Mapped Cache faster access time consumes less power per access consumes less area easy to implement simple to design higher miss rate Set-Associative Cache longer access time consumes more power per access consumes more area reduces conflict misses has a replacement policy Desirable cache : access time of direct-mapped cache + low miss rate of set-associative cache.

6 6 What is B-Cache? Balanced Cache (B-Cache): A mechanism to provide the benefit of cache block replacement while maintaining the constant access time of a direct-mapped cache

7 7 New features of B-Cache  Decoder length of direct-mapped cache is increased by 3 bits: accesses to heavily used sets can be reduced to 1/8th of original design.  A replacement policy is added.  A programmable decoder is used.

8 8 The problem (an Example) 8-bit adresses 0,1,8,9... 0,1,8,9

9 9 8-bit address same as in 2-way cache X : invalid PD entry B-Cache solution

10 10 Outline r Introduction r The B-Cache Organization r Experimental Results and Analysis r Related Work r Conclusion

11 11 Terminology r Memory address mapping factor (MF): r B-Cache associativity (BAS): PI : index length of PD NPI : index length of NPD OI : index length of original direct-mapped cache MF = 2 (PI+NPI) /2 OI, where MF≥1 BAS = 2 OI /2 NPI, where BAS≥1

12 12 B-Cache organization MF = 2 (PI+NPI) /2 OI =2 (6+6) /2 9 =8BAS = 2 (OI) /2 NPI =2 (3) /2 6 =8

13 13 Replacement policy  Random Policy:  Simple to design and needs very few extra hardware.  Least Recently Used(LRU):  Better hit rate but more area overhead

14 14 Outline r Introduction r The B-Cache Organization r Experimental Results and Analysis r Related Work r Conclusion

15 15 Experimental Methodology  Primary metric: miss rate  Other metrics: Latency,Storage,Power Costs, Overall Performance, Overall Energy  Baseline: level one cache (direct-mapped 16kB cache with 32 bytes line size for instruction and data caches)  26 SPEC2K benchmarks run using the SimpleScalar tool set

16 16 Data miss-rate reductions 16 entry victim buffer set-associative caches B-Caches with dif. MFs

17 17 Latency

18 18 Storage Overhead r Additional hardware for the B-Cache is the CAM based PD. r 4.3% higher than baseline

19 19 Power Overhead r Extra power consumption: PD of each subarray. r Power reduction:  3-bit data length reduction  Removal of 3 input NAND gates r 10.5% higher than baseline

20 20 Overall Performance  Outperforms baseline by average of 5.9%.  Only 0.3% less than 8-way cache but 3.7% higher than victim buffer.

21 21 Overall Energy r B-Cache consumes least energy ( 2% less than the baseline ) r B-Cache reduces miss rate and hence accesses to 2nd level cache, which is more power costly. r When cache miss, B-Cache also reduces cache memory accesses through miss prediction of PD, which makes power overhead much less.

22 22 Outline r Introduction r The B-Cache Organization r Experimental Results and Analysis r Related Work r Conclusion

23 23 Related works  Reducing Miss Rate of Direct Mapped Caches  Page allocation  Column associative cache  Adaptive group associative cache  Skewed associative cache r Reducing Access Time of Set-associative Caches  Partial address matcing : predicting hit way  Difference bit cache

24 24 Compared with previous tech B-cache r Applied to both high performance and low- power embedded systems r Balanced without software intervention r Feasible and easy to implement

25 25 Outline r Introduction r The B-Cache Organization r Experimental Results and Analysis r Related Work r Conclusion

26 26 Conclusion r B-Cache allows accesses to cache sets to be balanced by increasing the decoder length and incorporating a replacement policy to a direct-mapped cache design. r Programmable decoders dynamically determine which memory address has a mapping to cache set r A 16kB level one B-Cache outperforms direct-mapped cache by 64.5% and 37.8% miss rate reductions for instruction and data cache, respectively r Average IPC improvement: 5.9% r Energy reduction: 2%. r Access time: same as direct mapped cache

27 27 Thanks!


Download ppt "1 Balanced Cache:Reducing Conflict Misses of Direct-Mapped Caches through Programmable Decoders ISCA 2006,IEEE. By Chuanjun Zhang Speaker: WeiZeng."

Similar presentations


Ads by Google