Benefits of Early Cache Miss Determination Memik G., Reinman G., Mangione-Smith, W.H. Proceedings of High Performance Computer Architecture Pages: 307.

Benefits of Early Cache Miss Determination Memik G., Reinman G., Mangione-Smith, W.H. Proceedings of High Performance Computer Architecture Pages: 307 – 316 Feb. 2003 On seminar book: 254

2/20 Abstract  As the performance gap between the processor and the memory subsystem increases, designers are forced to develop new latency techniques. Arguably, the most common technique is to utilize multi-level caches. Each new generation of processors is equipped with higher levels of memory hierarchy with increasing sizes at each level. In this paper, we propose 5 different techniques that will reduce the data access times and power consumption in processors with multi-level caches. Using the information about the blocks placed into and replaced from the caches, the techniques quickly determine whether an access at any cache level will be a miss. The accesses that are identified to miss are aborted. The structures used to recognize misses are much smaller than the cache structures. Consequently the data access times and power consumption is reduced. Using SimpleScalar simulator, we study the performance of these techniques for a processor with 5 cache levels. The best technique is able to abort 53.1% of the misses on average in SPEC2000 applications. Using these techniques, the execution time of the applications are reduced by up to 12.4% (5.4% on average), and the power consumption of the caches is reduced by as much as 11.6% (3.8% on average).

3/20 What’s the Problem  The fraction of data access time and cache power consumption caused by cache misses increases as the number of levels is increased in multi-level cache system  A great deal of the time and cache power are spent for accessing caches that miss On average, In processor with 5 levels of cache  The misses cause 25.5% of the data access time  The misses cause 18% of the cache power consumption  Motivating the exploration of technique to minimize the effects of cache misses

4/20 Introduction  Motivating example  If the data will be supplied by the nth level cache All the cache levels before n will be accesses causing unnecessary delay and power consumption  The proposed technique of this paper  Identify miss and bypass the access to the cache that will miss Store partial information about the blocks in a cache to identify whether the cache access may hit or definitely miss  If these misses are known in advance and not performed  The delay of data access will be reduced and the cache power consumption by the misses can be prevented

5/20 Mostly No Machine (MNM) Overview  When the address is given to the MNM  Miss signals for each cache level (except L1) is generated  The miss signals are propagated with the access through the cache levels  Two possible locations where the MNM can be realized The ith miss bit dictates whether the access at level i should be performed or bypassed  (a) Parallel MNM  L1 cache and MNM are accessed in parallel - Advantage : No MNM delay - Disadvantage: MNM consumes more power  (b) Serial MNM  The MNM is accessed only after the L1cache misses - Advantage : MNM consumes less power - Disadvantage: higher data access time (Increased by the delay of MNM)

6/20 Modification of Cache to Incorporate the MNM  Modification of cache structure  Extend each cache structure with logic To detect the miss signal and bypass the access if necessary  Each cache has to send the information to MNM about The blocks that are replaced from the cache  This is needed for the bookkeeping required at the MNM  In serial MNM, to synchronize the access and miss signal The request generated by the L1 is sent to MNM, which forwards the request to the L2

7/20 Benefits of the MNM Technique  Average data access time without MNM  Average data access time with MNM Cache_hit_time : time to access data at a cache Cache_miss_time : time to detect a miss in a cache 1  Abort the access to the cache when the MNM identifies a miss Prevent the time to access cache that will miss => improves data access time

8/20 Assumptions of the MNM Techniques  Portion of the address used by the MNM  Store block address to instead store exact bytes that are stored in a cache  MNM don’t assume the inclusion property of caches  EX: If cache level i contains a block b, block b is not necessarily contained in cache level i+1  MNM checks for the misses on cache level i+1, even if it can’t identify a miss in cache level i  EX: If MNM identifies the miss in L3, but couldn’t identify at L2, first the L2 cache will be accessed

9/20 1. Replacements MNM (RMNM)  Replacements MNM  Stores addresses that are replaced from the cache Therefore, access to the address will miss  Information about the replaced blocks is stored in an RMNM cache RMNM cache has a block size of (n-i) bits  n : # of separate caches  i : # of level 1 cache ● Each bit in the block corresponds to each level of cache, except the L1 cache When ith bit is set, that means the block is replaced from the Li cache

10/20 1. Replacements MNM (RMNM)  Scenario for a 2-level cache  pl. : place block into cache  repl. : replace block from cache  Since there are only two levels of caches Each RMNM block contains a single bit indicating Hit/Miss for L2 cache Block 0x2FC0 replace from L2 cache and place into RMNM cache Find block 0x2FC0 in RMNM cache, then identify L2 cache miss

11/20 2. Sum MNM (SMNM)  Sum MNM  Store hash values for block addresses in cache When a block is placed into cache, the block address is hashed and the resulting hash value is stored  The specific hash function  If the hash value of the access match any of the hash values of the existing cache blocks ● Else : Miss is captured, bypass cache access Gather information about the bit values on the address that are high ● Then : The access is performed

12/20 2. Sum MNM (SMNM)  The SMNM configuration is denoted by sum_width x replication Sum_width : sum_width at each checker Replication : # of parallel checkers implemented  SMNM example : SMNM_10x2  If there are multiple checkers The first one examines : the least significant bits The second one examines : the bits starting from the 7th rightmost bit The third one examines : the bits starting from the 13th rightmost bit the bits starting from the 7th rightmost bit 2 parallel checker, each check different portion of block address (the bit strings length = 10)

13/20 3. Table MNM (TMNM)  Table MNM  Store the least significant N bits for block address in cache  The values are stored in the TMNM table, an array of size 2 N Locations corresponding to the address stored in cache  Are set to ‘0’, the remaining locations are set to ‘1’ The least significant N bits of the access are used to address TMNM table The value stored at the corresponding location is used as miss signal 1  Example TMNM for N = 6  The cache in the example only has 2 block  When the request comes to MNM, the corresponding bit position is read The location is high, which means the access will be misses and can be bypassed

14/20 3. Table MNM (TMNM)  There can be several block addresses that map to the same bit position in the TMNM table  Therefore the values at TMNM table are counters instead of single bit When a block is placed into cache  The corresponding counter is incremented, unless it is saturated When a block is replaced from cache  The corresponding counter is decremented, unless it is saturated  The TMNM configuration is denoted by TMNM_n x replication N : # of bits checked by each table (Store least significant N bits for block address in cache) Replication : # of tables examining different positions of the address

15/20 4. Common Address MNM (CMNM)  Common address MNM  Capture the common value at the block address by examining the most significant bits of the address  Virtual tag finder has K registers  Store the most significant portion of the cache block  During an access  The most significant (32-m) bits of the address are compared to the values in virtual tag finder If it matches any of the existing values The index of the matching register is attached to the remaining m bits of the examined address And used to address CMNM table M bits index

16/20 4. Common Address MNM (CMNM)  When an address is checked, there are two ways to identify a miss  First, the (32-m) most significant bits of the address are entered to the virtual tag finder If it doesn’t match any of the register values in the virtual tag finder  The access is marked as a miss  Second, if a register matches the address, used the index attaches with the remaining m bits of the address to access CMNM table If the corresponding position has value ‘1’  Again a miss is indicated  The CMNM configuration is denoted by CMNM_k x m  k : # of registers in the virtual tag finder  m : least significant m bits of the examined address

17/20 Discussion of the MNM Techniques  The MNM techniques  Never incorrectly indicate that bypassing should be used  But don’t detect all opportunity for bypassing  The miss signal should be reliable  Because the cost of indicating an access will miss when the data is actually in the cache is high Must perform redundant access to higher level of memory hierarchy  The cost of a hit misindication is relatively less A redundant tag comparison at the cache ● If the MNM indicates a miss Then, the block certainly doesn’t exist in the cache ● If the MNM output maybe hit Then, the access might still miss in the cache

18/20 Improvement in Execution Time  To eliminate the delay of MNM, we perform simulations with the parallel MNM  The HMNM4 technique reduces the execution time by as much as 12.4% and by 5.4% on average HMNM means hybrid MNM which combines all the techniques to increase the misses identified by the technique  The perfect MNM reduces the execution time by as much as 25.0% and 10.0% on average The perfect MNM identifies all the misses, and hence bypasses all the cache miss

19/20 Reduction in Cache Power Consumption  To achieve the maximum power reduction, we perform simulations with the serial MNM  The HMNM4 reduces the cache power consumption by as much as 11.6% and by 3.8% on average  The perfect MNM reduces the cache power consumption by as much as 37.6% and 10.2% on average

20/20 Conclusions  Proposed techniques to identify the misses in different cache levels  When an access is identified to miss, the access is directly bypassed to the next cache level Thereby, reduce the delay and power consumption associated with the misses  Totally presented 5 different techniques to recognize some of the cache misses  For Hybrid MNM technique The execution time is reduced by 5.4% on average (ranging from 0.6% to 12.4%) The cache power consumption is reduced by 3.8% on average (ranging from 0.4% to 11.6%)

Benefits of Early Cache Miss Determination Memik G., Reinman G., Mangione-Smith, W.H. Proceedings of High Performance Computer Architecture Pages: 307.

Similar presentations

Presentation on theme: "Benefits of Early Cache Miss Determination Memik G., Reinman G., Mangione-Smith, W.H. Proceedings of High Performance Computer Architecture Pages: 307."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Benefits of Early Cache Miss Determination Memik G., Reinman G., Mangione-Smith, W.H. Proceedings of High Performance Computer Architecture Pages: 307.

Similar presentations

Presentation on theme: "Benefits of Early Cache Miss Determination Memik G., Reinman G., Mangione-Smith, W.H. Proceedings of High Performance Computer Architecture Pages: 307."— Presentation transcript:

Similar presentations

About project

Feedback