Presentation is loading. Please wait.

Presentation is loading. Please wait.

Highly-Associative Caches for Low-Power Processors Michael Zhang Krste Asanovic

Similar presentations


Presentation on theme: "Highly-Associative Caches for Low-Power Processors Michael Zhang Krste Asanovic"— Presentation transcript:

1 Highly-Associative Caches for Low-Power Processors Michael Zhang Krste Asanovic {rzhang|krste}@lcs.mit.edu

2 Motivation Cache uses 30-60% processor energy in embedded systems.  Example: 43% for StrongArm-1 Many academic studies on cache [Albera, Bahar, ’98] – Power and performance trade-offs [Amrutur, Horowitz, ‘98,’00] – Speed and power scaling [Bellas, Hajj, Polychronopoulos, ’99] – Dynamic cache management [Ghose, Kamble,’99] – Power reduction through sub-banking, etc. [Inoue, Ishihara, Murakami,’99] – Way predicting set-associative cache [Kin,Gupta, Mangione-Smith, ’97] – Filter cache [Ko, Balsara, Nanda, ’98] – Multilevel caches for RISC and CISC [Wilton, Jouppi, ’94] – CACTI cache model Many Industrial Low-Power Processors use CAM (content- addressable-memory)  ARM3 – 64-way set-associative – [Furber et. al. ’89]  StrongArm – 32-way set-associative – [Santhanam et. al. ’98]  Intel XScale – 32-way set-associative – ’01 CAM: Fast and Energy-Efficient

3 Talk Outline Structural Comparison Area and Delay Comparison Energy Comparison Related work Conclusion

4 Set-Associative RAM-tag Cache Not energy-efficient  All ways are read out Two-phase approach  More energy-efficient  2X latency =? Tag Status Data Tag Index Offset

5 Set-Associative RAM-tag Sub-bank Not energy-efficient  All ways are read out Two-phase approach  More energy-efficient  2X latency Sub-banking 1 sub-bank = 1 way Low-swing Bitlines  Only for reads, writes performed full-swing Wordline Gating I/O BUS addr Address Decoder gwl lwl Offset Dec. offset Data SRAM Cells Sense Amps lwl Offset Dec. offset Data SRAM Cells Sense Amps 32 128 Tag SRAM Cells Tag Comp BUS Cache

6 CAM-tag Cache Only one sub-bank activated Associativity within sub-bank Easy to implement high associativity Tag Status Data HIT? Word Tag Offset Bank Tag Status Data HIT?

7 CAM-tag Cache Sub-bank Only one sub-bank activated Associativity within sub-bank Easy to implement high associativity I/O BUS tag CAM-tag Array gwl lwl Offset Dec. offset SRAM Cells Sense Amps lwl Offset Dec. offset SRAM Cells Sense Amps 32 128

8 CAM Functionality and Energy Usage WL BitBit_bSBitSBit_b match 10-T CAM Cell With Separate Write/Search Lines And Low-Swing Match Line WL BitBit_bSBitSBit_b match Match 1 01 1 1 0 WL BitBit_bSBitSBit_b match Mismatch 1 0 01 1 0 CAM Energy Dissipation  Search Lines  Match Lines  Drivers SRAM XOR

9 CAM-tag Cache Sub-bank Layout 10% area overhead over RAM-tag cache 2x12x32 CAM Array 1-KB Cache Sub-bank implemented in 0.25  m CMOS technology 32x64 RAM Array

10 Delay Comparison gwl Global Wordline Decoding lwl Decoded offset Local Wordline Decoding Tag readout Data readout Index Bits Tag bits Data out Tag bits Tag Comp. Data out RAM tag Cache Critical Path: CAM tag Cache Critical Path: Within 3% of each other Tag bits broadcasting Tag bits Tag Comp. gwl Data readout Local Wordline Decoding lwl Decoded offset

11 Hit Energy Comparison Hit Energy per Access for 8KB Cache in pJ Associativity and Implementation

12 Miss Rate Results LZW ijpeg m88ksim perl pegwit gcc

13 Total Access Energy (pegwit) Pegwit – High miss rate for high associativity Total Energy per Access for 8KB Cache in pJ Miss Energy Expressed in Multiples of 32-bit Read Access Energy

14 Total Access Energy (perl) Perl – Very low miss rate for high associativity Total Energy per Access for 8KB Cache in pJ Miss Energy Expressed in Multiples of 32-bit Read Access Energy

15 Other Advantages of CAM-tag Hit signal generated earlier  Simplifies pipelines Simplified store operation  Wordline only enabled during a hit  Stores can happen in a single cycle  No write buffer necessary

16 Related Work CACTI and CACTI2  [Wilton and Jouppi ’94],[Reinman and Jouppi, ’99]  Accurate delay and energy estimate Results within 10%  Energy estimate not suited for low-power designs  Typical Low-power features not included in CACTI Sub-banking Low-swing bitlines Wordline gating Separate CAM search line Low-swing match lines  Energy Estimation 10X greater than our model for one CAM-tag cache sub-bank Our results closely agree with [Amruthur and Horowitz, 98]

17 Conclusion CAM tags – high performance and low-power  Energy consumption of 32-way CAM < 2-way RAM  Easy to implement highly-associative tags  Low area overhead (10%)  Comparable access delay  Better CPI by reducing miss rate

18 Thank You! http://www.cag.lcs.mit.edu/scale/


Download ppt "Highly-Associative Caches for Low-Power Processors Michael Zhang Krste Asanovic"

Similar presentations


Ads by Google