The Memory Hierarchy CPSC 321 Andreas Klappenecker.
Published byModified over 4 years ago
Presentation on theme: "The Memory Hierarchy CPSC 321 Andreas Klappenecker."— Presentation transcript:
The Memory Hierarchy CPSC 321 Andreas Klappenecker
Some Results from the Survey Issues with the CS curriculum CPSC 111 Computer Science Concepts & Prg CPSC 310 Databases CPSC 431 Software Engineering Something from the wish list: More C++ More Software Engineering More focus on industry needs Less focus on industry needs
Some Results from the Survey Why (MIPS) assembly language? More detailed explanations of programming language xyz. Implement slightly reduced version of the Pentium 4 or Athlon processors Have another computer architecture class Lack of information on CS website about specialization...
Follow Up CPSC 462 Microcomputer Systems CPSC 410 Operating Systems Go to seminars/lectures by Bjarne Stroustrup, Jaakko Jarvi, or Gabriel Dos Reis
Memory Current memory is largely implemented in CMOS technology. Two alternatives: SRAM fast, but not area efficient stored value in a pair of inverting gates DRAM slower, but more area efficient value stored on charge of a capacitor (must be refreshed)
Memory Users want large and fast memories SRAM is too expensive for main memory DRAM is too slow for many purposes Compromise Build a memory hierarchy
Locality If an item is referenced, then it will be again referenced soon (temporal locality) nearby data will be referenced soon (spatial locality) Why does code have locality?
Memory Hierarchy The memory is organized as a hierarchy levels closer to the processor is a subset of any level further away the memory can consist of multiple levels, but data is typically copied between two adjacent levels at a time initially, we focus on two levels
Two Level Hierarchy Upper level (smaller and faster) Lower level (slower) A unit of information that is present or not within a level is called a block If data requested by the processor is in the upper level, then this is called a hit, otherwise it is called a miss If a miss occurs, then data will be retrieved from the lower level. Typically, an entire block is transferred
Cache A cache represents some level of memory between CPU and main memory [More general definitions are often used]
A Toy Example Assumptions Suppose that processor requests are each one word, and that each block consists of one word Example Before request C = [X1,X2,…,Xn-1] Processor requests Xn not contained in C item Xn is brought from the memory to the cache After the request C = [X1,X2,…,Xn-1,Xn] Issues What happens if the cache is full?
Issues How do we know whether the data item is in the cache? If it is, how do we find it? Simple strategy: direct mapped cache exactly one location where data might be in the cache
Mapping: address modulo the number of blocks in the cache, x -> x mod B Direct Mapped Cache
Cache with 1024=2 10 words tag from cache is compared against upper portion of the address If tag=upper 20 bits and valid bit is set, then we have a cache hit otherwise it is a cache miss What kind of locality are we taking advantage of? Direct Mapped Cache
Taking advantage of spatial locality: Direct Mapped Cache
Read hits this is what we want! Read misses stall the CPU, fetch block from memory, deliver to cache, restart Write hits: can replace data in cache and memory (write-through) write the data only into the cache (write-back the cache later) Write misses: read the entire block into the cache, then write the word Hits vs. Misses
What Block Size? A large block size reduces cache misses Cache miss penalty increases We need to balance these two constraints How can we measure cache performance? How can we improve cache performance?
The performance of a cache depends on many parameters: Memory stall clock cycles Read stall clock cycles Write stall clock cycles
Cache Block Mapping Direct mapped cache a block goes in exactly one place in the cache Fully associative a block can go anywhere in the cache difficult to find a block parallel comparison to speed-up search
Cache Block Mapping Set associative Each block maps to a unique set, and the block can be placed into any element of that set Position is given by (Block number) modulo (# of sets in cache) If the sets contain n elements, then the cache is called n-way set associative