Announcements Tuesday, November 30, midterm exam
Cache Placement strategies direct mapped fully associative set-associative Replacement strategies random FIFO LRU
Mapping: address modulo the number of blocks in the cache, x -> x mod B Direct Mapped Cache
Set Associative Caches Each block maps to a unique set, the block can be placed into any element of that set, Position is given by (Block number) modulo (# of sets in cache) If the sets contain n elements, then the cache is called n-way set associative
Cache with 1024=2 10 words tag from cache is compared against upper portion of the address If tag=upper 20 bits and valid bit is set, then we have a cache hit otherwise it is a cache miss What kind of locality are we taking advantage of? Direct Mapped Cache The index is determined by address mod 1024 Byte offset
Taking advantage of spatial locality: Direct Mapped Cache Block offset
Address Determination reconstruction of the memory address = tag bits || set index bits || block offset || byte offset Example: 32 bit words, cache capacity 2^12 = 4096 words, blocks of 8 words, direct mapped byte offset = 2 bits, block offset = 3 bits, set index bits = 9 bits, tag bits = 18 bits
Example Suppose you want to realize a cache with a capacity for 8 KB of data (32 bits of address size). Assume that the blocksize is 4 words and a word consists of 4 bytes. How many bits are needed to realize a direct mapped cache? 8 KByte = 2K words = 512 blocks = 2^9 blocks direct mapped => # index bits = log(2^9)=9. 2^9 x (128 + (32 – 9 – 2 – 2) + 1) = 2^9 x 148 bits = number of blocks x (bits per block + tag + valid bit) How many bits are needed to realize a 8-way set associative cache? Number of tag bits increase by 3. Why?
Typical Questions Show the evolution of a cache Determine the number of bits needed in an implementation of a cache Know the placement and replacement strategies Be able to design a cache according to specifications Determine the number of cache misses Measure cache performance
Typical Questions What kind of placement is typically used in virtual memory systems? What is a translation lookaside buffer? Why is a TLB used?
Pages: virtual memory blocks Page faults: if data is not in memory, retrieve it from disk huge miss penalty, thus pages should be fairly large (e.g., 4KB) reducing page faults is important (LRU is worth the price) can handle the faults in software instead of hardware using write-through takes too long so we use writeback Example: page size 2 12 =4KB; 2 18 physical pages; main memory <= 1GB; virtual memory <= 4GB
Page Faults Incredible high penalty for a page fault Reduce number of page faults by optimizing page placement Use fully associative placement full search of pages is impractical pages are located by a full table that indexes the memory, called the page table the page table resides within the memory
Page Tables The page table maps each page to either a page in main memory or to a page stored on disk
Making Memory Access Fast Page tables slow us down Memory access will take at least twice as long access page table in memory access page What can we do? Memory access is local => use a cache that keeps track of recently used address translations, called translation lookaside buffer
Making Address Translation Fast A cache for address translations: translation lookaside buffer
Obstacles to Pipelining Structural Hazards hardware cannot support the combination of instructions in the same clock cycle Control Hazards need to make decision based on results of one instruction while other is still executing Data Hazards instruction depends on results of instruction still in pipeline
Control Hazards Resolution (for branch) Stall pipeline predict result delayed branch
Stall on Branch Assume that all branch computations are done in stage 2 Delay by one cycle to wait for the result
Branch Prediction Predict branch result For example, predict always that branch is not taken (e.g. reasonable for while instructions) if choice is correct, then pipeline runs at full speed if choice is incorrect, then pipeline stalls
Data Hazards A data hazard results if an instruction depends on the result of a previous instruction add $s0, $t0, $t1 sub $t2, $s0, $t3 // $s0 to be determined These dependencies happen often, so it is not possible to avoid them completely Use forwarding to get missing data from internal resources once available
Typical Questions Given a brief specification of the processor and a sequences of instructions, determine all pipeline hazards. Most typical question: fill in some steps in a timing diagram (almost every exam has such a question, google).