ENGS 116 Lecture 131 Caches and Virtual Memory Vincent H. Berk October 31 st, 2008 Reading for Today: Sections C.1 – C.3 (Jouppi article) Reading for Monday:

ENGS 116 Lecture 131 Caches and Virtual Memory Vincent H. Berk October 31 st, 2008 Reading for Today: Sections C.1 – C.3 (Jouppi article) Reading for Monday: Sections C.4 – C.7 Reading for Wednesday: Sections 5.1 – 5.3

ENGS 116 Lecture 122 Improving Cache Performance Average memory-access time (AMAT) = Hit time + Miss rate  Miss penalty (ns or clocks) Improve performance by: 1. Reducing the miss rate 2. Reducing the miss penalty 3. Reducing the time to hit in the cache

ENGS 116 Lecture 123 Reducing Miss Rate Larger Blocks Larger Cache Higher Associativity

ENGS 116 Lecture 124 Classifying Misses: 3 Cs Compulsory: The first access to a block is not in the cache, so the block must be brought into the cache. Also called cold start misses or first reference misses. (Misses even in an infinite cache) Capacity: If the cache cannot contain all the blocks needed during execution of a program, capacity misses will occur due to blocks being discarded and later retrieved. (Misses in fully associative, size X cache) Conflict: If block-placement strategy is set associative or direct mapped, conflict misses (in addition to compulsory & capacity misses) will occur because a block can be discarded and later retrieved if too many blocks map to its set. Also called collision misses or interference misses. (Misses in N-way set associative, size X cache)

ENGS 116 Lecture 125 3Cs Absolute Miss Rate (SPEC92) Conflict Cache Size (KB) Miss Rate per Type 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 1248 163264 128 1-way 2-way 4-way 8-way Capacity Compulsory Compulsory vanishingly small

ENGS 116 Lecture 126 2:1 Cache Rule Cache Size (KB) Miss Rate per Type 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 1248 163264 128 1-way 2-way 4-way 8-way Capacity Compulsory Conflict miss rate 1-way associative cache size X = miss rate 2-way associative cache size X/2

ENGS 116 Lecture 127 3Cs Relative Miss Rate Cache Size (KB) Miss Rate per Type 0% 20% 40% 60% 80% 100% 1248 163264 128 1-way 2-way 4-way 8-way Capacity Compulsory Conflict Flaws: for fixed block size Good: insight

ENGS 116 Lecture 128 How Can We Reduce Misses? 3 Cs: Compulsory, Capacity, Conflict In all cases, assume total cache size not changed What happens if we: 1)Change Block Size: Which of 3Cs is obviously affected? 2)Change Associativity: Which of 3Cs is obviously affected? 3)Change Compiler: Which of 3Cs is obviously affected?

ENGS 116 Lecture 129 1. Reduce Misses via Larger Block Size

ENGS 116 Lecture 1210 2. Reduce Misses: Larger Cache Size Obvious improvement but: Longer hit time Higher cost Each cache size favors a block-size, based on memory bandwidth AMAT = Hit time + Miss rate  Miss penalty (ns or clocks)

ENGS 116 Lecture 1211 3. Reduce Misses via Higher Associativity 2:1 Cache Rule: –Miss Rate DM cache size N ≈ Miss Rate 2-way SA cache size N/2 Beware: Execution time is final measure! –Will clock cycle time increase? 8-Way is almost fully associative

ENGS 116 Lecture 1212 Example: Avg. Memory Access Time vs. Miss Rate Example: assume CCT = 1.10 for 2-way, 1.12 for 4-way, 1.14 for 8-way vs. CCT direct mapped Cache SizeAssociativity (KB)1-way2-way4-way8-way 12.332.152.072.01 21.981.861.761.68 41.721.671.611.53 81.461.481.471.43 161.291.321.321.32 321.201.241.251.27 641.141.201.211.23 1281.101.171.181.20 (Red means A.M.A.T. not improved by more associativity)

ENGS 116 Lecture 1213 Reducing Miss Penalty Multilevel caches Read priority over write AMAT = Hit time + Miss rate  Miss penalty (ns or clocks)

ENGS 116 Lecture 1214 1. Reduce Miss Penalty: L2 Caches L2 Equations AMAT = Hit Time L1 + Miss Rate L1  Miss Penalty L1 Miss Penalty L1 = Hit Time L2 + Miss Rate L2  Miss Penalty L2 AMAT = Hit Time L1 + Miss Rate L1  (Hit Time L2 + Miss Rate L2  Miss Penalty L2 ) Definitions: –Local miss rate — misses in this cache divided by the total number of memory accesses to this cache (Miss rate L2 ) –Global miss rate — misses in the cache divided by the total number of memory accesses generated by the CPU (Miss Rate L1  Miss Rate L2 ) –Global miss rate is what matters —indicates what fraction of memory accesses from CPU go all the way to main memory

ENGS 116 Lecture 1215 Comparing Local and Global Miss Rates 32 KByte 1st level cache; Increasing 2nd level cache Global miss rate close to single level cache rate provided L2 >> L1 Don’t use local miss rate L2 not tied to CPU clock cycle! Cost & A.M.A.T. Generally fast hit times and fewer misses Since hits are few, target miss reduction

ENGS 116 Lecture 1216 L2 cache block size & A.M.A.T. 32KB L1, 8-byte path to memory

ENGS 116 Lecture 1217 2. Reduce Miss Penalty: Read Priority over Write on Miss Write through with write buffers offer RAW conflicts with main memory reads on cache misses If simply wait for write buffer to empty, might increase read miss penalty (old MIPS 1000 by 50%) Check write buffer contents before read; if no conflicts, let the memory access continue Write Back? –Read miss replacing dirty block –Normal: Write dirty block to memory, and then do the read –Instead copy the dirty block to a write buffer, then do the read, and then do the write –CPU stalls less frequently since restarts as soon as read finished

ENGS 116 Lecture 1218 Reducing Hit Time Avoiding Address Translation in index AMAT = Hit time + Miss rate  Miss penalty (ns or clocks)

ENGS 116 Lecture 1319 1. Fast Hits by Avoiding Address Translation Send virtual address to cache? Called Virtually Addressed Cache or Virtual Cache vs. Physical Cache –Every time process is switched logically must flush the cache; otherwise get false hits >> Cost is time to flush + “compulsory” misses from empty cache –Must handle aliases (sometimes called synonyms): Two different virtual addresses map to same physical address Solution to aliases –HW guarantees each block a unique physical address OR page coloring used to ensure virtual and physical addresses match in last x bits Solution to cache flush –Add process identifier tag that identifies process as well as address within process: cannot get a hit if wrong process

ENGS 116 Lecture 1320 Virtually Addressed Caches CPU TB $ MEM VA PA Conventional Organization CPU $ TB MEM VA PA Virtually Addressed Cache Translate only on miss Synonym Problem CPU $TB MEM VA PA Tags PA Overlap $ access with VA translation: requires $ index to remain invariant across translation VA Tags L2 $

ENGS 116 Lecture 1321 If index is physical part of address, can start tag access in parallel with translation so that can compare to physical tag Limits cache to page size: what if want bigger caches and uses same trick? –Higher associativity moves barrier to right –Page coloring (software OS requires that all Aliases share lower address bits, leads to set-associative pages!) 2. Fast Cache Hits by Avoiding Translation: Index with Physical Portion of Address 0 Page Address Page Offset Address TagIndexBlock Offset 311112

ENGS 116 Lecture 1422 Virtual Memory Virtual Address (2 32, 2 64 ) to Physical Address mapping (2 28 ) Virtual memory in terms of cache: –Cache block? –Cache miss? How is virtual memory different from caches? –What controls replacement –Size (transfer unit, mapping mechanisms) –Lower-level use

ENGS 116 Lecture 1423 Figure 5.36The logical program in its contiguous virtual address space is shown on the left; it consists of four pages A, B, C, and D. C A B 0 4K 8K 12K 16K 20K 24K 28K Physical address: A C B D 0 4K 8K 12K Virtual address: Physical main memory Virtual memory D Disk

ENGS 116 Lecture 1424 Figure 5.37Typical ranges of parameters for caches and virtual memory.

ENGS 116 Lecture 1425 Virtual Memory 4 Questions for Virtual Memory (VM)? –Q1:Where can a block be placed in the upper level? fully associative, set associative, or direct mapped? –Q2:How is a block found if it is in the upper level? –Q3:Which block should be replaced on a miss? random or LRU? –Q4:What happens on a write? write back or write through? Other issues: size; pages or segments or hybrid

ENGS 116 Lecture 1426 Figure 5.40The mapping of a virtual address to a physical address via a page table. Page offsetVirtual page number Virtual address Page table Physical address Main memory

ENGS 116 Lecture 1427 Fast Translation: Translation Buffer (TLB) Cache of translated addresses Data portion usually includes physical page frame number, protection field, valid bit, use bit, and dirty bit Alpha 21064 data TLB: 32-entry fully associative 4 3 2 1 Page-frame address Page offset Tag Physical page # (low-order 13 bits of address) 34-bit physical address (high-order 21 bits of address) 32:1 MUX VRW  

ENGS 116 Lecture 1428 Selecting a Page Size Reasons for larger page size –Page table size is inversely proportional to the page size; therefore memory saved –Fast cache hit time easy when cache ≤ page size (VA caches); bigger page makes it feasible as cache grows in size –Transferring larger pages to or from secondary storage, possibly over a network, is more efficient –Number of TLB entries is restricted by clock cycle time, so a larger page size maps more memory, thereby reducing TLB misses Reasons for a smaller page size –Fragmentation: don’t waste storage; data must be contiguous within page –Quicker process start for small processes Hybrid solution: multiple page sizes –Alpha: 8 KB, 16 KB, 32 KB, 64 KB pages (43, 47, 51, 55 virtual addr bits)

ENGS 116 Lecture 1429 Alpha VM Mapping “64-bit” address divided into 3 segments –seg0 (bit 63 = 0) user code/heap –seg1 (bit 63 = 1, 62 = 1) user stack –kseg (bit 63 = 1, 62 = 0) kernel segment for OS Three level page table, each one page –Alpha only 43 bits of VA –(future min page size up to 64 KB 55 bits of VA) PTE bits; valid, kernel & user, read & write enable (no reference, use, or dirty bit) –What do you do? Page table entry Page Table Base Register + + + Physical address page offset physical page-frame number Main memory Virtual address page offsetlevel3 seg0/seg1 selector level1level2 21 10 13 000 … 0 or 111 … 1 8 bytes 32 bit address 32 bit fields L2 page table L3 page table L1 page table....................................

Memory Hierarchy

ENGS 116 Lecture 1431 Protection Avoid separate processes to access each others memory –Causes Segmentation Fault: sigSEGV –Useful for Multitasking systems –Operating system issue Each Process has its own state –Page tables –Heap, Text, Stack pages –Registers, PC To prevent processes from modifying their own page tables: –Rings of protection, Kernel vs. User To prevent processes from modifying other process memory: –Page tables point to distinct physical pages

ENGS 116 Lecture 1432 Protection 2 Each page needs: –PID bit –Read/Write/Execute bit Each process needs –Stack frame page(s) –Text or code pages –Data or heap pages –State table keeping: »PC and other CPU status registers »State of all registers

ENGS 116 Lecture 1433 Alpha 21064 Separate Instruction & Data TLB & Caches TLBs fully associative TLB updates in SW (“Private Arch Lib”) Caches 8KB direct mapped, write through Critical 8 bytes first Prefetch instr. stream buffer 2 MB L2 cache, direct mapped, WB (off-chip) 256 bit path to main memory, 4 x 64-bit modules Victim buffer: to give read priority over write 4-entry write buffer between D$ & L2$ Stream Buffer Write Buffer Victim Buffer Instr Data

ENGS 116 Lecture 131 Caches and Virtual Memory Vincent H. Berk October 31 st, 2008 Reading for Today: Sections C.1 – C.3 (Jouppi article) Reading for Monday:

Similar presentations

Presentation on theme: "ENGS 116 Lecture 131 Caches and Virtual Memory Vincent H. Berk October 31 st, 2008 Reading for Today: Sections C.1 – C.3 (Jouppi article) Reading for Monday:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

ENGS 116 Lecture 131 Caches and Virtual Memory Vincent H. Berk October 31 st, 2008 Reading for Today: Sections C.1 – C.3 (Jouppi article) Reading for Monday:

Similar presentations

Presentation on theme: "ENGS 116 Lecture 131 Caches and Virtual Memory Vincent H. Berk October 31 st, 2008 Reading for Today: Sections C.1 – C.3 (Jouppi article) Reading for Monday:"— Presentation transcript:

Similar presentations

About project

Feedback