Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cache Miss-Aware Dynamic Stack Allocation Authors: S. Jang. et al. Conference: International Symposium on Circuits and Systems (ISCAS), 2007 Presenter:

Similar presentations


Presentation on theme: "Cache Miss-Aware Dynamic Stack Allocation Authors: S. Jang. et al. Conference: International Symposium on Circuits and Systems (ISCAS), 2007 Presenter:"— Presentation transcript:

1 Cache Miss-Aware Dynamic Stack Allocation Authors: S. Jang. et al. Conference: International Symposium on Circuits and Systems (ISCAS), 2007 Presenter: Tareq Hasan Khan ID: 11083577 ID: 11083577 ECE, U of S Literature review-4 (EE 800) Literature review-4 (EE 800)

2 2 Outline Introduction to Cache and Stack Introduction to Cache and Stack Proposed Dynamic Stack Allocator Proposed Dynamic Stack Allocator Cache Miss Predictor Cache Miss Predictor Stack Pointer Manager Stack Pointer Manager Results Results Conclusion Conclusion

3 3 Introduction Cache Cache A small and high-speed on-chip memory A small and high-speed on-chip memory Bridges the speed gap between microprocessor and main memory Bridges the speed gap between microprocessor and main memory It is necessary to reduce cache misses without increasing cache associativity for low-power embedded systems It is necessary to reduce cache misses without increasing cache associativity for low-power embedded systems Stack Stack A group of memory location used for local variables, temporary data of an application or return location of function calls A group of memory location used for local variables, temporary data of an application or return location of function calls Last in First Out (LIFO) structure Last in First Out (LIFO) structure Half (49%) of memory access related to stack Half (49%) of memory access related to stack

4 4 Dynamic Stack Allocator Conventional stack allocation is a method that inserts and extracts data sequentially without the consideration of cache misses Conventional stack allocation is a method that inserts and extracts data sequentially without the consideration of cache misses Proposed hardware - Dynamic Stack Allocator (DSA) Proposed hardware - Dynamic Stack Allocator (DSA) Cache Miss Predictor (CMP) Cache Miss Predictor (CMP) computes a cache miss probability at each cache line using the history of cache misses computes a cache miss probability at each cache line using the history of cache misses Stack Pointer Manager (SPM) Stack Pointer Manager (SPM) select a location for the stack pointer that has the lowest select a location for the stack pointer that has the lowest cache miss probability

5 5 Dynamic Stack Allocator

6 6 Outline Introduction to Cache and Stack Introduction to Cache and Stack Proposed Dynamic Stack Allocator Proposed Dynamic Stack Allocator Cache Miss Predictor Cache Miss Predictor Stack Pointer Manager Stack Pointer Manager Results Results Conclusion Conclusion

7 7 Cache Miss Predictor (CMP) Cache Miss Controller (CMC) Cache Miss (CM) buffer Consists of “index” and “count” register pairs

8 8 Cache Miss Controller (CMC) Cache controller detects cache misses through comparing the tags in the cache with tag bits of address requested by the processor. When a cache miss is detected, the cache controller transfers cache miss signal to notify CMP that cache miss has occurred and an index of missing line is also supplied. On a cache miss, the index is saved at CM buffer and its corresponding counter is incremented by the CMC. When the CM buffer is full, an entry is replaced according to the interval-based LRU policy

9 9 Cache Miss (CM) buffer Recent CM buffer (RCM buffer) History CM buffer (HCM buffer)

10 10 Cache Miss (CM) buffer Recent CM buffer (RCM buffer) On a cache miss to cache line k, an associative lookup into the RCM buffer is performed using k. If there is an entry with index k, then the counter for the line k is incremented. However, if no match occurs and the RCM buffer is not full, the index is recorded in one of the empty lines and the corresponding counter is incremented. History CM buffer (HCM buffer) When the RCM buffer is full, the HCM buffer is replaced with the contents of the RCM buffer according to the LRU policy. The indices in the HCM buffer are replaced with the indices in the RCM buffer with a larger value. In the interval-based LRU policy, the comparison for the replacement doesn’t occur until the RCM buffer is full.

11 11 Outline Introduction to Cache and Stack Introduction to Cache and Stack Proposed Dynamic Stack Allocator Proposed Dynamic Stack Allocator Cache Miss Predictor Cache Miss Predictor Stack Pointer Manager Stack Pointer Manager Results Results Conclusion Conclusion

12 12 Stack Pointer Manager (SPM) When an application requires a stack, the SPM looks for a location that has the lowest cache miss probability using the contents of the RCM and HCM buffer

13 13 Stack Pointer Manager (SPM) When a function is called, the SPM calculates the total cache miss probability within the searching window (R1, R2) of each sub-stack. To calculate the total cache miss probability, SPM looks up and down the RCM and HCM buffer to know whether indices included in the searching window exist or not. If it exists, SPM adds the corresponding value to get the total cache miss probability. After computation, SPM compares the computed probability of a sub-stack with one of other sub-stacks. Then, SPM dynamically selects a sub-stack that has the lowest cache miss probability as the stack for an application.

14 14 Outline Introduction to Cache and Stack Introduction to Cache and Stack Proposed Dynamic Stack Allocator Proposed Dynamic Stack Allocator Cache Miss Predictor Cache Miss Predictor Stack Pointer Manager Stack Pointer Manager Results Results Conclusion Conclusion

15 15 Result Implemented within the OpenRISC 1200 microprocessor with 8KB direct- mapped data cache and 8KB direct-mapped instruction cache, each with 16-byte line size The amount of data traffic between cache and main memory according to the size of the RCM and HCM buffer, where the traffic is normalized to one for conventional The amount of traffic of FFT is 42% smaller than one of the conventional scheme. Some cases, traffic increases, e.g., DFT with the DSA configurations of RCM(5) and HCM(8).

16 16 Result…cont. Variation of the amount of data traffic according to the number of sub-stacks. In all cases, the more the number of sub-stack is, the smaller the amount of traffic. But not a very significant improvement.

17 17 Result…cont. ASIC implementation of DSA was done The maximum speed was 87MHz The size of DSA is 0.3mm X 0.4mm which is about 1% of total core area

18 18 Conclusion Proposed a hardware for cache miss-aware dynamic stack allocation to reduce cache misses Proposed a hardware for cache miss-aware dynamic stack allocation to reduce cache misses Based on the history of cache misses, the proposed scheme controls the stack pointer Based on the history of cache misses, the proposed scheme controls the stack pointer to a location expected to cause smaller cache misses. In various benchmarks, it was shown that traffic In various benchmarks, it was shown that traffic between cache and main memory was reduced by DSA from 4% to 42%.

19 19 Thanks


Download ppt "Cache Miss-Aware Dynamic Stack Allocation Authors: S. Jang. et al. Conference: International Symposium on Circuits and Systems (ISCAS), 2007 Presenter:"

Similar presentations


Ads by Google