Presentation on theme: "10/20: Lecture Topics HW 3 Problem 2 Caches –Types of cache misses –Cache performance –Cache tradeoffs –Cache summary Input/Output –Types of I/O Devices."— Presentation transcript:
10/20: Lecture Topics HW 3 Problem 2 Caches –Types of cache misses –Cache performance –Cache tradeoffs –Cache summary Input/Output –Types of I/O Devices –How devices communicate with the rest of the system communicating with the processor communicating with memory
Problem #2 on HW 3 move $a0, $s0 move $a1, $s1 move $a2, $s2 move $a3, $s3 # Position A jal Add4 # Position B move $t0, $v0 move $a0, $s4 move $a1, $s5 move $a2, $s6 move $a3, $s7 # Position C jal Add4 # Position D move $t1, $v0 add $t2, $t0, $t1 Add4: # Position E jal Add2 # Position F move $s0, $v0 move $a0, $a2 move $a1, $a3 # Position G jal Add2 # Position H move $s1, $v0 add $v0, $s0, $s1 # Position I jr $ra Add2: add $v0, $a0, $a1 jr $ra
Tag, Index, Block Offset Recall an address can be decomposed into [tag,index,block offset] The general rule for determining this decomposition is to start from the right and work to the left Be careful of word vs. byte addresses
Steps to bits for tag,index,b.o. Step 1: Determine how many bits for the block offset. If the block size is 2 b bytes, then b bits are required for the block offset Step 2: Determine how many blocks fit in the cache. (Bytes in cache)/(Bytes in a block). Step 3: Determine how many rows (unique indices) the cache has. –For direct mapped, rows = number of blocks –For fully associative, rows = 1 –For set associative, rows = (number of blocks)/associativity
Steps to bits for tag,index,b.o. Step 4: Determine how many bits are needed to represent the index. If there are 2 r rows then you r bits. Step 5: Tag bits are whatever is left over from Step 1 and Step 4.
Cache Examples 4Kbyte, 8-way associative, cache with 2 words per block –How do you split up the address?
i-Cache and d-Cache There usually are two separate caches for instructions and data. Why? –Avoids structural hazards in pipelining –The combined cache is twice as big but still has an access time of a small cache –Allows both caches to operate in parallel, for twice the bandwidth
Handling i-Cache Misses 1.Stall the pipeline and send the address of the missed instruction to the memory 2.Instruct memory to perform a read; wait for the access to complete 3. Update the cache 4. Restart the instruction, this time fetching it successfully from the cache d-Cache misses are even easier, but still require a pipeline stall
Cache Replacement How do you decide which cache block to replace? If the cache is direct-mapped, it’s easy Otherwise, common strategies: –Random –Least Recently Used (LRU) –Other strategies are used at lower levels of the hierarchy. More on those later.
LRU Replacement Replace the block that hasn’t been used for the longest time. Reference stream: A B C D B D E B A C B C E D C B
LRU Implementations LRU is very difficult to implement for high degrees of associativity 4-way approximation: –1 bit to indicate least recently used pair –1 bit per pair to indicate least recently used item in this pair Much more complex approximations at lower levels of the hierarchy
The Three C’s of Caches Three reasons for cache misses: –Compulsory miss: item has never been in the cache –Capacity miss: item has been in the cache, but space was tight and it was forced out (occurs even with fully associative caches) –Conflict miss: item was in the cache, but the cache was not associative enough, so it was forced out (never occurs with fully associative caches)
Eliminating Cache Misses What cache parameters (cache size, block size, associativity) can you change to eliminate the following kinds of misses –compulsory –capacity –conflict
Multi-Level Caches Use each level of the memory hierarchy as a cache over the next lowest level Inserting level 2 between levels 1 and 3 allows: –level 1 to have a higher miss rate (so can be smaller and cheaper) –level 3 to have a larger access time (so can be slower and cheaper) The new effective access time equation:
Which cache system is better? 32 KB unified data and instruction cache –hit rate of 97% 16 KB data cache –hit rate of 92% And 16 KB instruction cache –hit rate of 98% Assume –20% of instructions are loads or stores
Cache Parameters and Tradeoffs If you are designing a cache, what choices do you have and what are their tradeoffs?
Summary: Classifying Caches Where can a block be placed? –Direct mapped, Set/Fully associative How is a block found? –Direct mapped: by index –Set associative: by index and search –Fully associative: by search What happens on a write access? –Write-back or Write-through Which block should be replaced? –Random –LRU (Least Recently Used)