Presentation on theme: "Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface."— Presentation transcript:
Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface
1 Example #1: Effect of Interleaving Consider a cache which has 8 words per block. On a read miss, the block that contains the desired word must be copied from the memory into the cache. Assume that the hardware has following properties. It takes 1 clock cycle to send an address to the main memory. The first word is accessed in 8 clock cycles, and subsequent words are accessed in 4 clock cycles. Also, one clock cycle is necessary to send the word to the cache. How many clock cycles does it take to send the block of words to the cache? The total time taken is 1 + 8 + (7x4) +1 = 38
2 Example #1: Effect of Interleaving If the memory is constructed as four interleaved modules, then when the starting address of the block arrives at the memory, all four modules being accessing the required data using the high order bits of the address. After 8 clock cycles, each module has one word of data in its DBR. These words are transferred to the cache one word at a time during the next 4 clock cycles. During this time, the next word in each module is accessed. Then it takes another 4 clock cycles to transfer these words to the cache. Therefore the total time taken is 1+8+4+4=17. Speed up obtained during interleaving is 38/17 = 2.2
3 Example #2: Effect of cache on processor chip Consider the impact of the cache on the overall performance of the computer. Let h be the hit rate, M be the miss penalty, that is, the time to access information in the main memory, and C the time to access information in the cache. Then, the average access time experienced by the processor is given by: Refer to page 332 of the text book Let us consider the following example. If the computer has no cache, then it takes 10 clock cycles for every memory read access. For a computer which has a cache that holds 8 word blocks and an interleaved main memory, it takes 17 clock cycles to transfer a block from the main memory to the cache. Assume that 30% of the instructions require a memory access, so there are 130 memory accesses for every 100 instructions executed. Assume that the hit rate in the cache are 0.95 for instructions and 0.9 for data. Then, the improvement in performance is: 130x10/100(0.95x1 + 0.05x17) + 30(0.9x1+0.1x17)=5.04
4 Example #3: Effect of L1 & L2 cache. Consider the impact of L1 and L2 cache on the overall performance of the processor. Let h1 be hit rate in cache L1, h2 the hit rate in cache L2, C1 the time to access information in L1 cache, C2 time to access information in L2 cache, M is the time to access information in the main memory. Then, the average access time of the processor is given by: Refer to page 335 of the text book.
5 Example #4: Set-associative cache A computer system has a main memory of 64K 16-bit words. It consists of a cache of 128 blocks with 16 words per block organized in a block set associative manner with 2 blocks per set. (a) Calculate the number of bits in each of the TAG, SET and WORD fields of the main memory address format. (b) Assume that the cache is initially empty. Suppose that the processor fetches 2080 words from locations 0,1,....2079, in that order. It then repeats this fetch sequence nine more times. If the cache is 10 times faster than the main memory, estimate the improvement factor resulting from the use of the cache. Assume that the LRU algorithm is used for block replacement. (a) The main memory address is 16 bits. The number of bits in the WORD field is 4. The number of bits in the SET field is 6. The number of bits in the TAG field is 16 - (6+4) = 6
6 Example #4: Set-associative cache Words 0, 1, 2,....,2079 occupy blocks 0 to 129 in the main memory. After blocks 0, 127 have been read from the main memory into the cache on the first pass, the cache is full. Because the replacement algorithm is LRU, main memory blocks that occupy the first two sets of the 64 cache sets are always overwritten before they can be used on a successive pass. In particular main memory blocks 0, 64 and 128 continually displace each other in competing for the 2 block positions in cache set 0. Similarly, main memory blocks 1, 65 and 129 continually displace each other in competing for the 2 block positions in cache set 1. Main memory blocks that occupy the last 62 sets are fetched once in the first pass and remain in the cache for the next 9 pases. On the first pass all 130 blocks must be fetched from the main memory. On each of the 9 passes blocks in the last 62 sets of the cache (62x2=124) are found in the cache. The remaining 6 blocks (130-124) must be fetced from the main memory. Improvement factor = Time without cache/Time with cache = 10x130x10t/(1x130x11t + 9(124x1t + 6x11t)) = 4.14