Memory: Virtual MemoryCSCE430/830 Virtual Memory Map Physical Memory Disk Individual Pages Virtual Memory: Definitions Key idea: simulate a larger physical memory than is actually available General approach: –Break address space up into pages –Each program accesses a working set of pages –Store pages: »In physical memory as space permits »On disk when no space left in physical memory –Access pages using virtual address
Memory: Virtual MemoryCSCE430/830 Permits a program's memory to be physically noncontiguous so it can be allocated from wherever available. This avoids fragmentation and compaction. HARDWARE An address is determined by: page number ( index into table ) + offset ---> mapping into ---> base address ( from table ) + offset. Frames = physical blocks Pages = logical blocks Size of frames/pages is defined by hardware (power of 2 to ease calculations) Virtual Memory: Paging
Memory: Virtual MemoryCSCE430/830 Paging Example - 32-byte memory with 4-byte pages 0 a 1 b 2 c 3 d 4 e 5 f 6 g 7 h 8 I 9 j 10 k 11 l 12 m 13 n 14 o 15 p 0 5 1 6 2 1 3 2 Page Table Logical Memory 0 4 I j k l 8 m n o p 12 16 20 a b c d 24 e f g h 28 Physical Memory Virtual Memory: Paging 1 1 2 3 4 5 6 7 0 0 2 3
Memory: Virtual MemoryCSCE430/830 Virtual Address Translation What happens during a memory access? –map virtual address into physical address using page table –If the page is in memory: access physical memory –If the page is on disk: page fault »Suspend program »Get operating system to load the page from disk Page table is in memory - this slows down access! Translation lookaside buffer (TLB) special cache of translated addresses (speeds access back up)
Memory: Virtual MemoryCSCE430/830 Just like any other cache, the TLB can be organized as fully associative, set associative, or direct mapped TLBs are usually small, typically not more than 128 - 256 entries even on high end machines. This permits fully associative lookup on these machines. Most mid-range machines use small n-way set associative organizations. CPU TLB Lookup Cache Main Memory VAPA miss hit data Trans- lation hit miss 20 t t 1/2 t Translation with a TLB Techniques for Fast Address Translation
Memory: Virtual MemoryCSCE430/830 Virtual Address and a Cache CPU Trans- lation Cache Main Memory VAPA miss hit data It takes an extra memory access to translate VA to PA This makes cache access very expensive, and this is the "innermost loop" that you want to go as fast as possible Techniques for Fast Address Translation
Memory: Virtual MemoryCSCE430/830 Virtual Memory Design Page size: 8KB TLB is direct mapped with 256 entries L1 cache is direct-mapped 8KB. L2 cache is direct mapped 4MB. L1 and L2 use 64-byte blocks The virtual address is 64 bits The physical address is 41 bits Please show the overall picture of memory hierarchy.
Memory: Virtual MemoryCSCE430/830 Virtually Indexed, Physically Tagged Cache What motivation? Fast cache hit by parallel TLB access No virtual cache shortcomings How could it be correct? Require cache way size <= page size; now physical index is from page offset Then virtual and physical indices are identical ⇒ works like a physically indexed cache!
Memory: Virtual MemoryCSCE430/830 Virtual Memory and Protection Important function of virtual memory: Protection –Allow sharing of single main memory by multiple processes –Provide each process with its own address space –Protect each process from memory accesses by other processes Basic mechanism: two modes of operation –User mode - allows access only to user address space –Supervisor (kernel) mode - allows access to OS address space System call - allows processor to change mode
Memory: Virtual MemoryCSCE430/830 Virtual Memory Crosscutting Issues: The Design of Memory HierarchiesCrosscutting Issues: The Design of Memory Hierarchies Superscalar CPU and Number of Ports to the Cache Cache must provide sufficient peak bandwidth to benefit from multiple issues. Some processors increase complexity of instruction fetch by allowing instructions to be issued to be found on any boundary instead of, say, multiples of 4 words. Speculative Execution and the Memory System Speculative and conditional instructions generate exceptions (by generating invalid addresses) that would otherwise not occur, which in turn can overwhelm the benefits of speculation with the exception handling overhead. Such CPUs must be matched with non-blocking caches and only speculate on L1 misses (due to the unbearable penalty of L2). Combining Instruction Cache with Instruction Fetch and Decode Mechanisms Increasing demand for ILP and clock rate has led to the merging of the first part of instruction execution with instruction cache, by incorporating trace cache (which combines branch prediction with instruction fetch) and storing the internal RISC operations in the trace cache (e.g., Pentium 4’s NetBurst microarchitecture). A cache hit in the merged cache saves portion of the instruction execution cycles.
Memory: Virtual MemoryCSCE430/830 Virtual Memory Crosscutting Issues: The Design of Memory HierarchiesCrosscutting Issues: The Design of Memory Hierarchies Embedded Computer Caches and Real-Time Performance In real-time applications, variation of performance matters much more than average performance. Thus, caches that offer average performance enhancement have to be used carefully. Instruction caches are often used due to the highly predictability of instructions; whereas data caches are “locked down”, forcing them to act as small scratchpad memory under program control. Embedded Computer Caches and Power It is much more power efficient to access on-chip memory than to access off-chip one (which needs to drive the pins, buses and activate external memory chips, etc). Other techniques, such as way prediction, can be used to save power (by only powering half of the two-way set-associative cache). I/O and Consistency of Cached Data Cache coherence Cache coherence problem must be addressed when I/O devices also share the same cached data.
Memory: Virtual MemoryCSCE430/830 Summary - Virtual Memory Bottom level of memory hierarchy for programs Used in all general-purpose architectures Relies heavily on OS for support