1 1999 ©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 14 Instructor: L.N. Bhuyan www.cs.ucr.edu/~bhuyan.

1 1999 ©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 14 Instructor: L.N. Bhuyan www.cs.ucr.edu/~bhuyan

2 1999 ©UCB Recap: Machine Organization: 5 classic components of any computer Personal Computer Processor (CPU) (active) Computer Control (“brain”) Datapath (“brawn”) Memory (passive) (where programs, & data live when running) Devices Input Output Components of every computer belong to one of these five categories

3 1999 ©UCB °Users want large and fast memories! SRAM access times are.5 – 5ns at cost of $4000 to $10,000 per GB. °DRAM access times are 50-70ns at cost of $100 to $200 per GB. °Disk access times are 5 to 20 million ns at cost of $.50 to $2 per GB. Memory Trends 2004

4 1999 ©UCB Memory Latency Problem µProc 60%/yr. (2X/1.5yr) DRAM 5%/yr. (2X/15 yrs) 1 10 100 1000 19801981198319841985198619871988198919901991199219931994199519961997199819992000 DRAM CPU 1982 Processor-Memory Performance Gap: (grows 50% / year) Performance Time Processor-DRAM Memory Performance Gap Motivation for Memory Hierarchy

5 1999 ©UCB The Goal: Illusion of large, fast, cheap memory °Fact: Large memories are slow, fast memories are small °How do we create a memory that is large, cheap and fast (most of the time)? °Hierarchy of Levels Uses smaller and faster memory technologies close to the processor Fast access time in highest level of hierarchy Cheap, slow memory furthest from processor °The aim of memory hierarchy design is to have access time close to the highest level and size equal to the lowest level

6 1999 ©UCB Recap: Memory Hierarchy Pyramid Processor (CPU) Size of memory at each level Level 1 Level 2 Level n Increasing Distance from CPU, Decreasing cost / MB Level 3... transfer datapath: bus Decreasing distance from CPU, Decreasing Access Time (Memory Latency)

7 1999 ©UCB Why Hierarchy works: Natural Locality °The Principle of Locality: Programs access a relatively small portion of the address space at any second Memory Address 02 n - 1 Probability of reference °Temporal Locality (Locality in Time):  Recently accessed data tend to be referenced again soon °Spatial Locality (Locality in Space):  nearby items will tend to be referenced soon 1 0

8 1999 ©UCB Memory Hierarchy: Terminology Hit: data appears in level X: Hit Rate: the fraction of memory accesses found in the upper level Miss: data needs to be retrieved from a block in the lower level (Block Y) Miss Rate = 1 - (Hit Rate) Hit Time: Time to access the upper level which consists of Time to determine hit/miss + memory access time Miss Penalty: Time to replace a block in the upper level + Time to deliver the block to the processor Note: Hit Time << Miss Penalty

9 1999 ©UCB Current Memory Hierarchy Control Data- path Processor regs Secon- dary Mem- ory L2 Cache Speed(ns):1ns2ns6ns100ns10,000,000ns Size (MB):0.00050.11-4100-1000100,000 Cost ($/MB):--$100$30$1 $0.05 Technology:RegsSRAMSRAMDRAMDisk L1 cache Main Mem- ory

10 1999 ©UCB Memory Hierarchy Technology °Random Access: access time is the same for all locations (Hardware decoder used) °Sequential Access: Very slow, Data accessed sequentially, access time is location dependent, considered as I/O, (Example: Disks and Tapes) °DRAM: Dynamic Random Access Memory High density, low power, cheap, slow Dynamic: needs to be “refreshed” regularly °SRAM: Static Random Access Memory Low density, high power, expensive, fast Static: content last “forever”(until lose power)

11 1999 ©UCB °SRAM: value is stored on a pair of inverting gates very fast but takes up more space than DRAM (4 to 6 transistors) °DRAM: value is stored as a charge on capacitor (must be refreshed) very small but slower than SRAM (factor of 5 to 10) Memories: Review

12 1999 ©UCB How is the hierarchy managed? °Registers  Memory By the compiler (or assembly language Programmer) °Cache  Main Memory By hardware °Main Memory  Disks By combination of hardware and the operating system (virtual memory: will cover next) By the programmer (Files)

13 1999 ©UCB Measuring Cache Performance CPU time = Execution cycles X clock cycle time If cache miss: (Execution cycles + Memory stall cycles) X clock cycle time Read-stall cycles = # reads X Read miss rate X Read miss penalty Write-stall cycles = # writes X write miss rate X write miss penalty Memory-stall cycles = Read-stall + write stall = Memory accesses X miss rate X miss penalty = # instns X misses/instn X miss penalty Measuring Cache Performance

14 1999 ©UCB Example Q: Cache miss penalty = 50 cycles and all instns take 2.0 cycles without memory stalls. Assume cache miss rate of 2% and 1.33 (why?) memory references per instn. What is the impact of cache? Ans: CPU time= IC x (CPI + Memory stall cycles/instn) x cycle time t Performance including cache misses is CPU time= IC x (2.0 + (1.33 x.02 x 50)) x cycle time = IC x 3.33 x t For a perfect cache that never misses CPU time = IC x 2.0 x t Hence, including the memory hierarchy stretches CPU time by 1.67 But, without memory hierarchy, the CPI would increase to 2.0 + 50 x 1.33 or 68.5 – a factor of over 30 times longer.

15 1999 ©UCB Cache Organization (1) How do you know if something is in the cache? (2) If it is in the cache, how to find it? °Answer to (1) and (2) depends on type or organization of the cache °In a direct mapped cache, each memory address is associated with one possible block within the cache Therefore, we only need to look in a single location in the cache for the data if it exists in the cache

16 1999 ©UCB Simplest Cache: Direct Mapped Memory 4-Block Direct Mapped Cache Block Address 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Cache Index 0 1 2 3 Cache Block 0 can be occupied by data from: Memory block 0, 4, 8, 12 -Cache Block 1 can be occupied by data from: Memory block 1, 5, 9, 13 0000 two 0100 two 1000 two 1100 two Block Size = 32/64 Bytes

17 1999 ©UCB Simplest Cache: Direct Mapped Main Memory 4-Block Direct Mapped Cache Block Address 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Cache Index 0 1 2 3 0010 0110 1010 1110 °index determines block in cache °index = (address) mod (# blocks) °If number of cache blocks is power of 2, then cache index is just the lower n bits of memory address [ n = log 2 (# blocks) ] tag index Memory block address

18 1999 ©UCB Simplest Cache: Direct Mapped w/Tag Main Memory Direct Mapped Cache Block Address 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 0010 0110 1010 1110 °tag determines which memory block occupies cache block °tag = left hand bits of address °hit: cache tag field = tag bits of address °miss: tag field  tag bits of addr. tag 11 data cache index

19 1999 ©UCB Finding Item within Block °In reality, a cache block consists of a number of bytes/words (32 or 64 bytes) to (1) increase cache hit due to locality property and (2) reduce the cache miss time. °Mapping: memory block I can be mapped to cache block frame I mod x, where x is the number of blocks in the cache -Called congruent mapping °Given an address of item, index tells which block of cache to look in °Then, how to find requested item within the cache block? °Or, equivalently, “What is the byte offset of the item within the cache block?”

20 1999 ©UCB Issues with Direct-Mapped °If block size > 1, rightmost bits of index are really the offset within the indexed block ttttttttttttttttt iiiiiiiiii oooo tagindexbyte to checkto offset if have selectwithin correct blockblockblock

21 1999 ©UCB Accessing data in a direct mapped cache Three types of events: °cache miss: nothing in cache in appropriate block, so fetch from memory °cache hit: cache block is valid and contains proper address, so read desired word °cache miss, block replacement: wrong data is in cache at appropriate block, so discard it and fetch desired data from memory Cache Access Procedure: (1) Use Index bits to select cache block (2) If valid bit is 1, compare the tag bits of the address with the cache block tag bits (3) If they match, use the offset to read out the word/byte.

22 1999 ©UCB Data valid, tag OK, so read offset return word d... Valid Tag 0x0-3 0x4-70x8-b0xc-f 0 1 2 3 4 5 6 7 1022 1023... 1 0abcd °000000000000000000 0000000001 1100 Index 0 0 0 0 0 0 0 0 0 1 2 3

23 1999 ©UCB An Example Cache: DecStation 3100 °Commercial Workstation: ~1985 °MIPS R2000 Processor (similar to pipelined machine of chapter 6) °Separate instruction and data caches: direct mapped 64K Bytes (16K words) each Block Size: 1 Word (Low Spatial Locality) Solution: Increase block size – 2 nd example

24 1999 ©UCB 1614 ValidTagData Hit 1632 16K entries 16 bits32 bits 31 30 17 16 15 5 4 3 2 1 0 Byte Offset Data Address (showing bit positions) DecStation 3100 Cache If miss, cache controller stalls the processor, loads data from main memory

25 1999 ©UCB 1612Byte offset HitData 16 32 4K entries 16 bits128 bits Mux 323232 2 32 Block offsetIndex Tag Address (showing bit positions) 31... 16 15.. 4 3 2 1 0 64KB Cache with 4-word (16-byte) blocks TagData V

26 1999 ©UCB Miss rates: 1-word vs. 4-word block (cache similar to DecStation 3100) I-cacheD-cacheCombined Programmiss ratemiss ratemiss rate gcc6.1%2.1%5.4% spice1.2%1.3%1.2% gcc2.0%1.7%1.9% spice0.3%0.6%0.4% 1-word block 4-word block

27 1999 ©UCB Miss Rate Versus Block Size 256 40% 35% 30% 25% 20% 15% 10% 5% 0% M i s s r a t e 64164 Block size (bytes) 1 KB 8 KB 16 KB 64 KB 256 KB total cache size Figure 7.12 - for direct mapped cache

28 1999 ©UCB Extreme Example: 1-block cache °Suppose choose block size = cache size? Then only one block in the cache °Temporal Locality says if an item is accessed, it is likely to be accessed again soon But it is unlikely that it will be accessed again immediately!!! The next access is likely to be a miss -Continually loading data into the cache but forced to discard them before they are used again -Worst nightmare of a cache designer: Ping Pong Effect

29 1999 ©UCB Block Size and Miss Penality °With increase in block size, the cost of a miss also increases °Miss penalty: time to fetch the block from the next lower level of the hierarchy and load it into the cache °With very large blocks, increase in miss penalty overwhelms decrease in miss rate °Can minimize average access time if design memory system right

30 1999 ©UCB Block Size Tradeoff Miss Penalty Block Size Average Access Time Increased Miss Penalty & Miss Rate Block Size Miss Rate Exploits Spatial Locality Fewer blocks: compromises temporal locality Block Size

1 1999 ©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 14 Instructor: L.N. Bhuyan www.cs.ucr.edu/~bhuyan.

Similar presentations

Presentation on theme: "1 1999 ©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 14 Instructor: L.N. Bhuyan www.cs.ucr.edu/~bhuyan."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 1999 ©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 14 Instructor: L.N. Bhuyan www.cs.ucr.edu/~bhuyan.

Similar presentations

Presentation on theme: "1 1999 ©UCB CS 161 Ch 7: Memory Hierarchy LECTURE 14 Instructor: L.N. Bhuyan www.cs.ucr.edu/~bhuyan."— Presentation transcript:

Similar presentations

About project

Feedback