# Lecture 41: Review Session #3 Reminders –Office hours during final week TA as usual (Tuesday & Thursday 12:50pm-2:50pm) Hassan: Wednesday 1pm to 4pm or.

## Presentation on theme: "Lecture 41: Review Session #3 Reminders –Office hours during final week TA as usual (Tuesday & Thursday 12:50pm-2:50pm) Hassan: Wednesday 1pm to 4pm or."— Presentation transcript:

Lecture 41: Review Session #3 Reminders –Office hours during final week TA as usual (Tuesday & Thursday 12:50pm-2:50pm) Hassan: Wednesday 1pm to 4pm or email me for an appointment (hassan@eecs.wsu.edu) –Final exam, Thursday 12/18/2014 @ 3:10pm Sloan 150 –Course evaluation (Blue Course Evaluation) Access through zzusis 1

Problem #9 How many total SRAM bits will be required to implement a 256KB four-way set associative cache. The cache is physically-indexed cache, and has 64-byte blocks. Assume that there are 4 extra bits per entry: 1 valid bit, 1 dirty bit, and 2 LRU bits for the replacement policy. Assume that the physical address is 50 bits wide. 2

Solution #9 The number of sets in the 256KB four-way set associative cache –(256*2 10 )/(4*64) =1024 A set has four entries. Each entry in the set occupies 4 bits + 64*8 bits = 516 bits The total number of SRAM bits required = 516*4*1024 = 2113536 3

Problem #10 Design a 128KB direct-mapped data cache that uses a 32-bit address and 16 bytes per block. Calculate the following: (a) How many bits are used for the byte offset? (b) How many bits are used for the set (index) field? (c) How many bits are used for the tag? 4

Solution #10 (a) How many bits are used for the byte offset? 4 bits (b) How many bits are used for the set (index) field? 13 bits (c) How many bits are used for the tag? 15 bits 5

Problem #11 Design a 8-way set associative cache that has 16 blocks and 32 bytes per block. Assume a 32 bit address. Calculate the following: (a) How many bits are used for the byte offset? (b) How many bits are used for the set (index) field? (c) How many bits are used for the tag? 6

Solution #11 (a) How many bits are used for the byte offset? 5 bits (b) How many bits are used for the set (index) field? 1 bits (c) How many bits are used for the tag? 26 bits 7

Problem #12 int i; int a[1024*1024]; int x=0; for(i=0;i<1024;i++) { x+=a[i]+a[1024*i]; } Consider the code snippet in code above. Suppose that it is executed on a system with a 2-way set associative 16KB data cache with 32-byte blocks, 32-bit words, and an LRU replacement policy. Assume that int is word-sized. Also assume that the address of ‘a’ is 0x0, that ‘i’ and ‘x’ are in registers, and that the cache is initially empty. How many data cache misses are there? 8

Solution #12 The number of sets in the cache = (16 * 2 10 ) /(2*32) = 256 Since a word size is 4 bytes, int is word sized and the size of a cache block is 32 bytes, the number of ints that would fit in a cache block is 8. Therefore all the ints in ‘a’ from a[0] to a[1023] map to one of the cache lines of the sets 0 to 127, while all the ints in ‘a’ from a[1024] to a[1024*2 -1] map to the sets 128 to 255. Similarly the array elements a[1024*2] to a[1024*3-1] map to cache lines of sets 0 to 127, a[1024*3] to a[1024*4 – 1] map to cache lines 128 to 255 and so on. In the loop, every time a[i] is accessed for ‘i’ being a multiple of 8 would be a miss. There the number of misses due to a[i] accesses inside the loop is 1024/8 = 128. Now all accesses to a[1024*i] within the loop are misses except the very first one (a[0] is already brought to the cache). This is because map alternately to sets 0 and 128 consecutively where there are cold misses the first time they are referenced. The total number of misses = 1023 + 128 = 1151 9

Problem #13 Give a concise answer to each of the following questions. Limit your answers to 20-30 words. (a) What is memory mapped I/O? (b) Why is DMA an improvement over CPU programmed I/O? (c) When would DMA transfer be a poor choice? (d) What are the two characteristics of program memory accesses that caches exploit? (e) What are three types of cache misses? (f) In what pipeline stage is the branch target buffer checked? (g) What needs to be stored in a branch target buffer in order to eliminate the branch penalty for an unconditional branch, Address of branch target, Address of branch target and branch prediction, or Instruction at branch target? 10

Problem #14 (True/False) A virtual cache access time is always faster than that of a physical cache? (True/False) High associativity in a cache reduces compulsory misses. (True/False) Both DRAM and SRAM must be refreshed periodically using a dummy read/write operation. (True/False) A write-through cache typically requires less bus bandwidth than a write-back cache. (True/False) Cache performance is of less importance in faster processors because the processor speed compensates for the high memory access time. (True/False) Memory interleaving is a technique for reducing memory access time through increased bandwidth utilization of the data bus. 11

What else? Midterm 2 & midterm 1 questions Homework assignments –Solutions for all assignments will be sent to your wsu email by Monday! 12

Download ppt "Lecture 41: Review Session #3 Reminders –Office hours during final week TA as usual (Tuesday & Thursday 12:50pm-2:50pm) Hassan: Wednesday 1pm to 4pm or."

Similar presentations