Presentation is loading. Please wait.

Presentation is loading. Please wait.

Uppsala Architecture Research Team But... That’s three wishes in one!!! Mommy, mommy! I want’ a hardware cache with few conflicts and low power consumption.

Similar presentations


Presentation on theme: "Uppsala Architecture Research Team But... That’s three wishes in one!!! Mommy, mommy! I want’ a hardware cache with few conflicts and low power consumption."— Presentation transcript:

1 Uppsala Architecture Research Team But... That’s three wishes in one!!! Mommy, mommy! I want’ a hardware cache with few conflicts and low power consumption that is easy to implement!

2 Uppsala Architecture Research Team Refinement and Evaluation of the Elbow Cache or The Little Cache that could Mathias Spjuth

3 Uppsala Architecture Research TeamCache Address Space B H C Memory References: A Memory References: A-B Memory References: A-B-C 2-way Set Associative Cache Memory References: A D F E H A F E D B G Memory References: A-B-C-D Memory References: A-B-C-D-E Memory References: A-B-C-D-E-F Memory References: A-B-C-D-E-F-G Memory References: A-B-C-D-E-F-G-H C G{{ { { Sets

4 Uppsala Architecture Research Team Conflicts (cont.) Traditional way of reducing conflicts is to use set associative caches. ++ Lower miss rate (than direct-mapped) -- Slower access -- Slower access -- More complexity (uses more chip-area) -- More complexity (uses more chip-area) -- Higher power consumption -- Higher power consumption

5 Uppsala Architecture Research Team Address Space Cache Bank 1 B H C Memory References: A Memory References: A-B Memory References: A-B-C Memory References: A F E H A F E D B G Memory References: A-B-C-D Memory References: A-B-C-D-E Memory References: A-B-C-D-E-F Memory References: A-B-C-D-E-F-G Memory References: A-B-C-D-E-F-G-H C G Cache Bank 2 2-waySkewedAssociativeCache D

6 Uppsala Architecture Research Team Address Space Cache Bank 1 B H C Memory References: A Memory References: A-B Memory References: A-B-C Memory References: A F E H A F E D B G Memory References: A-B-C-D Memory References: A-B-C-D-E Memory References: A-B-C-D-E-F Memory References: A-B-C-D-E-F-G Memory References: A-B-C-D-E-F-G-H C G Cache Bank 2 2-waySkewedAssociativeCache D H No Conflicts!

7 Uppsala Architecture Research Team Skewed associative caches Uses different hashing (skewing) functions for indexing each cache bank ++ Lower missrate (than set-assoc.) ++ More predictable -- Slightly slower (hashing) -- Slightly slower (hashing) -- ”Cannot” use LRU replacement -- ”Cannot” use LRU replacement -- ”Cannot” use VI-PT -- ”Cannot” use VI-PT

8 Uppsala Architecture Research Team Elbow Cache Improve the performance of a skewed associative cache by reallocating blocks within the cache. By doing so we get a broader choice of which block to choose as the victim. Use timestamps as replacement metric.

9 Uppsala Architecture Research Team Finding the victim Two methods: 1. Look-ahead Consider all possible placements before the first reallocation is made. 2. Feedback Only consider the immediate placements, then iterate.

10 Uppsala Architecture Research Team Address Space Cache Bank 1 B H C Memory References: A Memory References: A-B Memory References: A-B-C Memory References: F E H A F E D B G Memory References: A-B-C-D Memory References: A-B-C-D-E Memory References: A-B-C-D-E-F Memory References: A-B-C-D-E-F-G Memory References: A-B-C-D-E-F-G-H-X C G Cache Bank 2 2-wayElbowLookaheadCache D X A Replacement paths: F-B-AE-D-H X

11 Uppsala Architecture Research Team Address Space Cache Bank 1 B H C Memory References: A Memory References: A-B Memory References: A-B-C Memory References: F H A F E D B G Memory References: A-B-C-D Memory References: A-B-C-D-E Memory References: A-B-C-D-E-F Memory References: A-B-C-D-E-F-G Memory References: A-B-C-D-E-F-G-H-X C G Cache Bank 2 2-wayElbowFeedbackCache X A X Temp. Register E D

12 Uppsala Architecture Research Team Finding the victim (cont.) Look-ahead: ++ Most optimal -- Difficult to implement (>1 transformation) -- Difficult to implement (>1 transformation)Feedback: ++ Easy to implement (feed victim back to write buffer) -- Needs extra space in the write buffer -- Needs extra space in the write buffer

13 Uppsala Architecture Research Team Replacement Metrics Enhanced-Not-Recently-Used (NRUE): The best policy for skewed caches known so far. Each block contains two extra bits, a recently- used and very-recently-used bit, that are set on access to the block. These bits are regularly cleared. The very- recently-used bit is cleared more often. First, try to find a victim with no bit set. Then one with only the recently-used bit set. Then use random replacement.

14 Uppsala Architecture Research Team Timestamps 10100 100000 A TATA B TBTB T curr Data Timestamp Counter Increase counter on every cache allocation Dist(A)= T max – T curr + T A if T curr < T A T curr – T A if T curr >= T A { 10100 100001 10100 100010

15 Uppsala Architecture Research Team Timestamps Timestamp[ticks] T max 0 T curr TATATATA TBTBTBTB Dist(A) > Dist(B); A older than B TATATATA Dist(A) < Dist(B); B older than A

16 Uppsala Architecture Research Team Implementation Lookahead: At most one transformation (4 possible victims) each replacement. At most one transformation (4 possible victims) each replacement. Do the transformation and load the new data at the same time. Do the transformation and load the new data at the same time.

17 Uppsala Architecture Research Team Implementation Feedback: Up to 7 transformations (max. 8 possible victims) each replacement. Up to 7 transformations (max. 8 possible victims) each replacement. Temporary victims are moved to the write buffer, before reallocation. Temporary victims are moved to the write buffer, before reallocation. Extra control field in write buffer. Extra control field in write buffer.

18 Uppsala Architecture Research Team Feedback N 2:1 2:2 Y X X Bank IBank II Write Buffer X id 1 X id 2 b Step Data+Tag TmSt Data+Tag TmSt B TmSt A TmSt ≥1 Write Read C TmSt v b s write mem read mem i j k & Data+Tag TmSt

19 Uppsala Architecture Research Team Test Configurations Set associative: 2-way, 4-way, 8-way, 16-way Set associative: 2-way, 4-way, 8-way, 16-way Fully associative cache Fully associative cache Skewed associative, LRU Skewed associative, LRU Skewed associative, NRUE Skewed associative, NRUE Skewed associative, 5-bit timestamp Skewed associative, 5-bit timestamp Elbow cache, 1-step lookahead, 5-bit timestamp Elbow cache, 1-step lookahead, 5-bit timestamp Elbow cache, 7-step feedback, 5-bit timestamp Elbow cache, 7-step feedback, 5-bit timestamp

20 Uppsala Architecture Research Team Test Configurations (2) General configuration: 8 KB, 16 KB, 32 KB cache size 8 KB, 16 KB, 32 KB cache size L1 data cache with 32 byte block size L1 data cache with 32 byte block size Write Back – No Allocate on Write & infinite write buffer (all writes ignored) Write Back – No Allocate on Write & infinite write buffer (all writes ignored) Miss Rate Reduction (MRR): MRR = (MR ref – MR)/MR ref

21 Uppsala Architecture Research Team

22

23 Conclusions I. For a 2-way skewed cache, timestamp replacement gives almost the same performance as LRU. II. Timestamps are useful. III. A 2-way elbow cache has roughly the same performance as an 8-way set associative cache of the same size.

24 Uppsala Architecture Research Team Conclusions (2) IV. The lookahead design is slightly better than the feedback. V. There are drawbacks with all skewed caches (skewing delays, VI-PT). VI. If the problems can be solved, the elbow cache is a good alternative to set associative caches.

25 Uppsala Architecture Research Team Future Work Power awareness: How does an elbow cache stand up against traditional set associative caches when power is considered? How does an elbow cache stand up against traditional set associative caches when power consumptions is considered?

26 Uppsala Architecture Research Team Links UART web: www.it.uu.se/research/group/uart/

27 Uppsala Architecture Research Team ?


Download ppt "Uppsala Architecture Research Team But... That’s three wishes in one!!! Mommy, mommy! I want’ a hardware cache with few conflicts and low power consumption."

Similar presentations


Ads by Google