CPEG3231 Virtual Memory. CPEG3232 Review: The memory hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory.

CPEG3231 Virtual Memory

CPEG3232 Review: The memory hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of the memory at each level Inclusive– what is in L1$ is a subset of what is in L2$ is a subset of what is in MM that is a subset of is in SM 4-8 bytes (word) 1 to 4 blocks 1,024+ bytes (disk sector = page) 8-32 bytes (block)  Take advantage of the principle of locality to present the user with as much memory as is available in the cheapest technology at the speed offered by the fastest technology

CPEG3233 Virtual memory  Use main memory as a “cache” for secondary memory l Allows efficient and safe sharing of memory among multiple programs l Provides the ability to easily run programs larger than the size of physical memory l Automatically manages the memory hierarchy (as “one-level”)  What makes it work? – again the Principle of Locality l A program is likely to access a relatively small portion of its address space during any period of time  Each program is compiled into its own address space – a “virtual” address space l During run-time each virtual address must be translated to a physical address (an address in main memory)

CPEG3234 IBM System/350 Model 67

CPEG3235 VM simplifies loading and sharing  Simplifies loading a program for execution by avoiding code relocation  Address mapping allows programs to be load in any location in physical memory  Simplifies shared libraries, since all sharing programs can use the same virtual addresses  Relocation does not need special OS + hardware support as in the past

CPEG3236 “ Historically, there were two major motivations for virtual memory: to allow efficient and safe sharing of memory among multiple programs, and to remove the programming burden of a small, limited amount of main memory.” [Patt&Henn] “…a system has been devised to make the core drum combination appear to programmer as a single level store, the requisite transfers taking place automatically” Kilbum et al. Virtual memory motivation

CPEG3237 Terminology  Page: fixed sized block of memory 512-4096 bytes  Segment: contiguous block of segments  Page fault: a page is referenced, but not in memory  Virtual address: address seen by the program  Physical address: address seen by the cache or memory  Memory mapping or address translation: next slide

CPEG3238 Memory management unit Virtual AddressMem Management UnitPhysical Address from Processor to Memory Page fault Using elaborate Software page fault Handling algorithm

CPEG3239 Address translation Virtual Address (VA) Page offsetVirtual page number 31 30... 12 11... 0 Page offsetPhysical page number Physical Address (PA) 29... 12 11 0 Translation  So each memory request first requires an address translation from the virtual space to the physical space l A virtual memory miss (i.e., when the page is not in physical memory) is called a page fault  A virtual address is translated to a physical address by a combination of hardware and software

CPEG32310 } 4K Virtual address Main memory address (a) (b) Mapping virtual to physical space 64K virtual address space 32K main memory

CPEG32311 Virtual page number Page table Physical memory Disk storage The page table maps each page in virtual memory to either a page in physical memory or a page stored on disk, which is the next level in the hierarchy. A paging system

CPEG32312 Virtual page number Page table Disk storage Physical memory TLB The TLB acts as a cache on the page table for the entries that map to physical pages only A virtual address cache (TLB)

CPEG32313 Two Programs Sharing Physical Memory Program 1 virtual address space main memory  A program’s address space is divided into pages (all one fixed size) or segments (variable sizes) l The starting location of each page (either in main memory or in secondary memory) is contained in the program’s page table Program 2 virtual address space

CPEG32314 These figures, contrasted with the values for caches, represent increases of 10 to 100,000 times. Typical ranges of VM parameters

CPEG32315 Some virtual memory design parameters Paged VMTLBs Total size16,000 to 250,000 words 16 to 512 entries Total size (KB)250,000 to 1,000,000,000 0.25 to 16 Block size (B)4000 to 64,0004 to 32 Miss penalty (clocks)10,000,000 to 100,000,000 10 to 1000 Miss rates0.00001% to 0.0001% 0.01% to 2%

CPEG32316 Technology TechnologyAccess Time$ per GB in 2004 SRAM0.5 – 5ns$4,000 – 10,000 DRAM50 - 70ns$100 - 200 Magnetic disk5 -20 x 10 6 ns$0.5 - 2

CPEG32317 Address Translation Consideration  Direct mapping using register sets  Indirect mapping using tables  Associative mapping of frequently used pages

CPEG32318 The Page Table (PT) must have one entry for each page in virtual memory! How many Pages? How large is PT? Fundamental considerations

CPEG32319  Pages should be large enough to amortize the high access time. From 4 KB to 16 KB are typical, and some designers are considering size as large as 64 KB.  Organizations that reduce the page fault rate are attractive. The primary technique used here is to allow flexible placement of pages. (e.g. fully associative) 4 key design issues

CPEG32320  Page fault (misses) in a virtual memory system can be handled in software, because the overhead will be small compared to the access time to disk. Furthermore, the software can afford to used clever algorithms for choosing how to place pages, because even small reductions in the miss rate will pay for the cost of such algorithms.  Using write-through to manage writes in virtual memory will not work since writes take too long. Instead, we need a scheme that reduce the number of disk writes. 4 key design issues (cont.)

CPEG32321 Page Size Selection Constraints  Efficiency of secondary memory device (slotted disk/drum)  Page table size  Page fragmentation: last part of last page  Program logic structure: logic block size: < 1K ~ 4K  Table fragmentation: full PT can occupy large, sparse space  Uneven locality: text, globals, stack  Miss ratio

CPEG32322 An Example Case 1 VM page size512 VM address space 64K Total virtual page = = 128 pages 64K 512

CPEG32323 Case 2 VM page size512 = 2 9 VM address space 4G = 2 32 Total virtual page = = 8M pages Each PTE has 32 bits: so total PT size 8M x 4 = 32M bytes Note : assuming main memory has working set 4M byte or = = = 2 13 = 8192 pages 4G 512 ~~~~ 4M 512 2 22 2 9 An Example (cont.)

CPEG32324 How about VM address space =2 52 (R-6000) (4 Petabytes) page size 4K bytes so total number of virtual pages: 2 52 2 12 = 2 40 = ! An Example (cont.)

CPEG32325 Techniques for Reducing PT Size  Set a lower limit, and permit dynamic growth  Permit growth from both directions (text, stack)  Inverted page table (a hash table)  Multi-level page table (segments and pages)  PT itself can be paged: ie., put PT itself in virtual address space (Note: some small portion of pages should be in main memory and never paged out)

CPEG32326 LSI-11/73 Segment Registers

CPEG32327 VM implementation issues  Page fault handling: hardware, software or both  Efficient input/output: slotted drum/disk  Queue management. Process can be linked on l CPU ready queue: waiting for the CPU l Page in queue: waiting for page transfer from disk l Page out queue: waiting for page transfer to disk  Protection issues: read/write/execute  Management bits: dirty, reference, valid.  Multiple program issues: context switch, timeslice end

CPEG32328  Placement: OS designers always pick lower miss rates vs. simpler placement algorithm  So, “fully associativity - VM pages can go anywhere in the main M (compare with sector cache)  Question: why not use associative hardware? (# of PT entries too big!) Where to place pages

CPEG32329 pidi p i w Virtual address TLB Page map RWX pid M C P Page frame address in memory (PFA) PFA in S.M. i w Physical address Operation validation RWX Requested access type S/U Access fault Page fault PME (x) Replacement policy If s/u = 1 - supervisor mode PME(x) * C = 1-page PFA modified PME(x) * P = 1-page is private to process PME(x) * pid is process identification number PME(x) * PFA is page frame address Virtual to read address translation using page map How to handle protection and multiple users

CPEG32330 Page fault handling  When a virtual page number is not in TLB, then PT in M is accessed (through PTBR) to find the PTE  Hopefully, the PTE is in the data cache  If PTE indicates that the page is missing a page fault occurs  If so, put the disk sector number and page number on the page-in queue and continue with the next process  If all page frames in main memory are occupied, find a suitable one and put it on the page-out queue

CPEG32331 Fast address translation PT must involve at least two accesses of memory for each memory fetch or store Improvement: l Store PT in fast registers: example: Xerox: 256 regs l Implement VM address cache (TLB) l Make maximal use of instruction/data cache

CPEG32332 Some typical values for a TLB might be: Miss penaly some time may be as high as upto 100 cycles. TLB size can be as long as 16 entries. Miss penaly some time may be as high as upto 100 cycles. TLB size can be as long as 16 entries.

CPEG32333 TLB design issues  Placement policy: l Small TLBs: full-associative can be used l large TLBs: full-associative may be too slow  Replacement policy: random policy is used for speed/simplicity  TLB miss rate is low (Clark-Emer data [85] 3~4 times smaller then usual cache miss rate  TLB miss penalty is relatively low; it usually results in a cache fetch

CPEG32334  TLB-miss implies higher miss rate for the main cache  TLB translation is process-dependent l strategies for context switching 1. tagging by context 2. flushing cont’d complete purge by context (shared) No absolute answer TLB design issues (cont.)

CPEG32335 A Case Study: DECStation 3100 Virtual page number Page offset 31 30 29 28 27 …………….....15 14 13 12 11 10 9 8 ………..…3 2 1 0 Virtual address Valid Dirty Tag Physical page number Physical address Valid Tag Data = = Data 12 20 32 14 2 16 Cache hit Cache TLB Byte offset Index Tag TLB hit

CPEG32336 TLB access TLB hit? Virtual address Write? Try to read data from cache Check protection Write data into cache, update the dirty bit, and Put the data and the address into the write buffer Cache hit? Cache miss stall Yes No TLB miss exception No DECStation 3100 TLB and cache

CPEG32337 IBM System/360-67 memory management unit CPU cycle time 200 ns Mem cycle time 750 ns

CPEG32338 Segment (12)Offset (12) Virtual Address (32) Page (8) Offset (12)Page (12) Bus-out Address (from CPU) Offset (12)Page (12) Bus-in Address (to memory) Dynamic Address Translation (DAT) IBM System/360-67 address translation

CPEG32339 Offset (12)VM Page (12) Bus-out Address (from CPU) Offset (12)PH Page (12) Bus-in Address (to memory) 11522 559 3188 4445 9110 13041 777 1227 IBM System/360-67 associative registers

CPEG32340 (4)Offset (12) 1 0 0 1 … 4095 Virtual Address (24) … VRW255 Page (8) Phys Page (24 bit addr) 1 0 1,048,575 … Virtual Page (32 bit addr) VRW0 1 … Segment Table Reg (32) + Segment Table 2 3 Page Table 2 VRW255 VRW0 1 … Page Table 4 4 V Valid bit R Reference Bit W Write (dirty) Bit 3 2 5 4 IBM System/360-67 segment/page mapping

CPEG32341 Virtual addressing with a cache  Thus it takes an extra memory access to translate a VA to a PA CPU Trans- lation Cache Main Memory VAPA miss hit data  This makes memory (cache) accesses very expensive (if every access was really two accesses)  The hardware fix is to use a Translation Lookaside Buffer (TLB) – a small cache that keeps track of recently used address mappings to avoid having to do a page table lookup

CPEG32342 Making address translation fast Physical page base addr Main memory Disk storage Virtual page # V 1111110101011111101010 1110111101 Tag Physical page base addr V TLB Page Table (in physical memory)

CPEG32343 Translation lookaside buffers (TLBs)  Just like any other cache, the TLB can be organized as fully associative, set associative, or direct mapped V Virtual Page # Physical Page # Dirty Ref Access  TLB access time is typically smaller than cache access time (because TLBs are much smaller than caches) l TLBs are typically not more than 128 to 256 entries even on high end machines

CPEG32344 A TLB in the memory hierarchy  A TLB miss – is it a page fault or merely a TLB miss? l If the page is loaded into main memory, then the TLB miss can be handled (in hardware or software) by loading the translation information from the page table into the TLB -Takes 10’s of cycles to find and load the translation info into the TLB l If the page is not in main memory, then it’s a true page fault -Takes 1,000,000’s of cycles to service a page fault  TLB misses are much more frequent than true page faults CPU TLB Lookup Cache Main Memory VAPA miss hit data Trans- lation hit miss ¾ t ¼ t

CPEG32345 Two Machines’ Cache Parameters Intel P4AMD Opteron TLB organization1 TLB for instructions and 1TLB for data Both 4-way set associative Both use ~LRU replacement Both have 128 entries TLB misses handled in hardware 2 TLBs for instructions and 2 TLBs for data Both L1 TLBs fully associative with ~LRU replacement Both L2 TLBs are 4-way set associative with round-robin LRU Both L1 TLBs have 40 entries Both L2 TLBs have 512 entries TBL misses handled in hardware

CPEG32346 TLB Event Combinations TLBPage Table CachePossible? Under what circumstances? Hit Miss Hit MissHitMiss HitMissMiss/ Hit Miss Hit

CPEG32347 TLB Event Combinations TLBPage Table CachePossible? Under what circumstances? Hit Miss Hit MissHitMiss HitMissMiss/ Hit Miss Hit Yes – what we want! Yes – although the page table is not checked if the TLB hits Yes – TLB miss, PA in page table Yes – TLB miss, PA in page table, but data not in cache Yes – page fault Impossible – TLB translation not possible if page is not present in memory Impossible – data not allowed in cache if page is not in memory

CPEG32348 Reducing Translation Time  Can overlap the cache access with the TLB access l Works when the high order bits of the VA are used to access the TLB while the low order bits are used as index into cache TagData = TagData = Cache HitDesired word VA Tag PA Tag TLB Hit 2-way Associative Cache Index PA Tag Block offset

CPEG32349 Why Not a Virtually Addressed Cache?  A virtually addressed cache would only require address translation on cache misses data CPU Trans- lation Cache Main Memory VA hit PA but l Two different virtual addresses can map to the same physical address (when processes are sharing data), i.e., two different cache entries hold data for the same physical address – synonyms -Must update all cache entries with the same physical address or the memory becomes inconsistent

CPEG32350 The Hardware/Software Boundary  What parts of the virtual to physical address translation is done by or assisted by the hardware? l Translation Lookaside Buffer (TLB) that caches the recent translations -TLB access time is part of the cache hit time -May allot an extra stage in the pipeline for TLB access l Page table storage, fault detection and updating -Page faults result in interrupts (precise) that are then handled by the OS -Hardware must support (i.e., update appropriately) Dirty and Reference bits (e.g., ~LRU) in the Page Tables l Disk placement -Bootstrap (e.g., out of disk sector 0) so the system can service a limited number of page faults before the OS is even loaded

CPEG32351 Virtual page number Page table Disk storage Physical memory TLB The TLB acts as a cache on the page table for the entries that map to physical pages only Very little hardware with software assisst Software

CPEG32352 Summary  The Principle of Locality: l Program likely to access a relatively small portion of the address space at any instant of time. -Temporal Locality: Locality in Time -Spatial Locality: Locality in Space  Caches, TLBs, Virtual Memory all understood by examining how they deal with the four questions 1. Where can block be placed? 2. How is block found? 3. What block is replaced on miss? 4. How are writes handled?  Page tables map virtual address to physical address l TLBs are important for fast translation

CPEG3231 Virtual Memory. CPEG3232 Review: The memory hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory.

Similar presentations

Presentation on theme: "CPEG3231 Virtual Memory. CPEG3232 Review: The memory hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CPEG3231 Virtual Memory. CPEG3232 Review: The memory hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory.

Similar presentations

Presentation on theme: "CPEG3231 Virtual Memory. CPEG3232 Review: The memory hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory."— Presentation transcript:

Similar presentations

About project

Feedback