Caching I Andreas Klappenecker CPSC321 Computer Architecture.
Published byModified over 4 years ago
Presentation on theme: "Caching I Andreas Klappenecker CPSC321 Computer Architecture."— Presentation transcript:
Caching I Andreas Klappenecker CPSC321 Computer Architecture
Memory Current memory is largely implemented in CMOS technology. Two alternatives: SRAM fast, but not area efficient stored value in a pair of inverting gates DRAM slower, but more area efficient value stored on charge of a capacitor (must be refreshed)
Memory Users want large and fast memories SRAM is too expensive for main memory DRAM is too slow for many purposes Compromise Build a memory hierarchy
Locality If an item is referenced, then it will be again referenced soon (temporal locality) nearby data will be referenced soon (spatial locality) Why does code have locality?
Memory Hierarchy The memory is organized as a hierarchy levels closer to the processor is a subset of any level further away the memory can consist of several multiple levels, but data is typically copied between two adjacent levels at a time initially, we focus on two levels
Two Level Hierarchy Upper level (smaller and faster) Lower level (slower) A unit of information that is present or not within a level is called a block If data requested by the processor is in the upper level, then this is called a hit, otherwise it is called a miss If a miss occurs, then data will be retrieved from the lower level. Typically, an entire block is transferred
Cache A cache represents some level of memory between CPU and main memory [More general definitions are often used]
A Toy Example Assumptions Suppose that processor requests are each one word, and that each block consists of one word Example Before request C = [X1,X2,…,Xn-1] Processor requests Xn not contained in C item Xn is brought from the memory to the cache After the request C = [X1,X2,…,Xn-1,Xn] Issues What happens if the cache is full?
Issues How do we know whether the data item is in the cache? If it is, how do we find it? Simple strategy: direct mapped cache exactly one location where data might be in the cache
Mapping: address modulo the number of blocks in the cache, x -> x mod B Direct Mapped Cache
Cache with 1024=2 10 words tag from cache is compared against upper portion of the address If tag=upper 20 bits and valid bit is set, then we have a cache hit otherwise it is a cache miss What kind of locality are we taking advantage of? Direct Mapped Cache
Taking advantage of spatial locality: Direct Mapped Cache
Read hits this is what we want! Read misses stall the CPU, fetch block from memory, deliver to cache, restart Write hits: can replace data in cache and memory (write-through) write the data only into the cache (write-back the cache later) Write misses: read the entire block into the cache, then write the word Hits vs. Misses