Presentation on theme: "Virtual Memory Prof. Sin-Min Lee Department of Computer Science."— Presentation transcript:
Virtual Memory Prof. Sin-Min Lee Department of Computer Science
Fixed (Static) Partitions Attempt at multiprogramming using fixed partitions –one partition for each job –size of partition designated by reconfiguring the system –partitions can’t be too small or too large. Critical to protect job’s memory space. Entire program stored contiguously in memory during entire execution. Internal fragmentation is a problem.
Original StateAfter Job Entry 100KJob 1 (30K) Partition 1 Partition 2 25KJob 4 (25K) Partition 2 Partition 3 25K Partition 3 Partition 4 50KJob 2 (50K) Partition 4 Job List : J130K J250K J330K J425K Table 2.1 : Main memory use during fixed partition allocation of Table 2.1. Job 3 must wait.
Dynamic Partitions Available memory kept in contiguous blocks and jobs given only as much memory as they request when loaded. Improves memory use over fixed partitions. Performance deteriorates as new jobs enter the system –fragments of free memory are created between blocks of allocated memory (external fragmentation).
Dynamic Partitioning of Main Memory & Fragmentation (Figure 2.2)
Dynamic Partition Allocation Schemes First-fit: Allocate the first partition that is big enough. –Keep free/busy lists organized by memory location (low- order to high-order). –Faster in making the allocation. Best-fit: Allocate the smallest partition that is big enough – Keep free/busy lists ordered by size (smallest to largest). –Produces the smallest leftover partition. –Makes best use of memory.
First-Fit Allocation Example (Table 2.2) J1 10K J2 20K J3 30K* J4 10K Memory MemoryJob Job Internal locationblock sizenumber sizeStatusfragmentation KJ1 10KBusy20K KJ4 10KBusy 5K KJ2 20KBusy30K K Free Total Available:115KTotal Used: 40K Job List
Best-Fit vs. First-Fit First-Fit Increases memory use Memory allocation takes less time Increases internal fragmentation Discriminates against large jobs Best-Fit More complex algorithm Searches entire table before allocating memory Results in a smaller “free” space (sliver)
Release of Memory Space : Deallocation Deallocation for fixed partitions is simple –Memory Manager resets status of memory block to “free”. Deallocation for dynamic partitions tries to combine free areas of memory whenever possible –Is the block adjacent to another free block? –Is the block between 2 free blocks? –Is the block isolated from other free blocks?
Case 1: Joining 2 Free Blocks
Case 2: Joining 3 Free Blocks
Case 3: Deallocating an Isolated Block
Relocatable Dynamic Partitions Memory Manager relocates programs to gather all empty blocks and compact them to make 1 memory block. Memory compaction (garbage collection, defragmentation) performed by OS to reclaim fragmented sections of memory space. Memory Manager optimizes use of memory & improves throughput by compacting & relocating.
Compaction Steps Relocate every program in memory so they’re contiguous. Adjust every address, and every reference to an address, within each program to account for program’s new location in memory. Must leave alone all other values within the program (e.g., data values).
Memory Before & After Compaction (Figure 2.5)
Contents of relocation register & close-up of Job 4 memory area (a) before relocation & (b) after relocation and compaction (Figure 2.6)
24 Virtual Memory Virtual Memory (VM) = the ability of the CPU and the operating system software to use the hard disk drive as additional RAM when needed (safety net) Good – no longer get “insufficient memory” error Bad - performance is very slow when accessing VM Solution = more RAM
Motivations for Virtual Memory Use Physical DRAM as a Cache for the Disk –Address space of a process can exceed physical memory size –Sum of address spaces of multiple processes can exceed physical memory Simplify Memory Management –Multiple processes resident in main memory. Each process with its own address space –Only “active” code and data is actually in memory Allocate more memory to process as needed. Provide Protection –One process can’t interfere with another. because they operate in different address spaces. –User process cannot access privileged information different sections of address spaces have different permissions.
Levels in Memory Hierarchy CPU regs CacheCache Memory disk size: speed: $/Mbyte: line size: 32 B 1 ns 8 B RegisterCacheMemoryDisk Memory 32 KB-4MB 2 ns $100/MB 32 B 128 MB 50 ns $1.00/MB 4 KB 20 GB 8 ms $0.006/MB larger, slower, cheaper 8 B32 B4 KB cachevirtual memory
DRAM vs. SRAM as a “Cache” DRAM vs. disk is more extreme than SRAM vs. DRAM –Access latencies: DRAM ~10X slower than SRAM Disk ~100,000X slower than DRAM –Importance of exploiting spatial locality: First byte is ~100,000X slower than successive bytes on disk –vs. ~4X improvement for page-mode vs. regular accesses to DRAM –Bottom line: Design decisions made for DRAM caches driven by enormous cost of misses DRAM SRAM Disk
Locating an Object in a “Cache” (cont.) Data : 1: N-1: X Object Name Location D: J: X: 1 0 On Disk “Cache”Page Table DRAM Cache –Each allocate page of virtual memory has entry in page table –Mapping from virtual pages to physical pages From uncached form to cached form –Page table entry even if page not in memory Specifies disk address –OS retrieves information
CPU 0: 1: N-1: Memory A System with Physical Memory Only Examples: –most Cray machines, early PCs, nearly all embedded systems, etc. Addresses generated by the CPU point directly to bytes in physical memory Physical Addresses
A System with Virtual Memory Examples: –workstations, servers, modern PCs, etc. Address Translation: Hardware converts virtual addresses to physical addresses via an OS-managed lookup table (page table) CPU 0: 1: N-1: Memory 0: 1: P-1: Page Table Disk Virtual Addresses Physical Addresses
Page Faults (Similar to “Cache Misses”) What if an object is on disk rather than in memory? –Page table entry indicates virtual address not in memory –OS exception handler invoked to move data from disk into memory current process suspends, others can resume OS has full control over placement, etc. CPU Memory Page Table Disk Virtual Addresses Physical Addresses CPU Memory Page Table Disk Virtual Addresses Physical Addresses Before fault After fault
Terminology Cache: a small, fast “buffer” that lies between the CPU and the Main Memory which holds the most recently accessed data. Virtual Memory: Program and data are assigned addresses independent of the amount of physical main memory storage actually available and the location from which the program will actually be executed. Hit ratio: Probability that next memory access is found in the cache. Miss rate: (1.0 – Hit rate) 4
Importance of Hit Ratio Given: –h = Hit ratio –T a = Average effective memory access time by CPU –T c = Cache access time –T m = Main memory access time Effective memory time is: T a = hT c + (1 – h)T m Speedup due to the cache is: S c = T m / T a Example: Assume main memory access time of 100ns and cache access time of 10ns and there is a hit ratio of.9. T a =.9(10ns) + (1 -.9)(100ns) = 19ns S c = 100ns / 19ns = 5.26 Same as above only hit ratio is now.95 instead: T a =.95(10ns) + (1 -.95)(100ns) = 14.5ns S c = 100ns / 14.5ns = 6.9 5
Cache vs Virtual Memory Primary goal of Cache: increase Speed. Primary goal of Virtual Memory: increase Space. 6
Cache Replacement Algorithms Replacement algorithm determines which block in cache is removed to make room. 2 main policies used today –Least Recently Used (LRU) The block replaced is the one unused for the longest time. –Random The block replaced is completely random – a counter-intuitive approach. 15
LRU vs Random As the cache size increases there are more blocks to choose from, therefore the choice is less critical probability of replacing the block that’s needed next is relatively low. Cache Size Miss Rate: LRU Miss Rate: Random 16KB 4.4% 5.0% 64KB 1.4% 1.5% 256KB 1.1% Below is a sample table comparing miss rates for both LRU and Random. 16
Virtual Memory Replacement Algorithms 1) Optimal 2) First In First Out (FIFO) 3) Least Recently Used (LRU) 17
Optimal Replace the page which will not be used for the longest (future) period of time. Faults are shown in boxes; hits are not shown. 7 page faults occur
Optimal A theoretically “best” page replacement algorithm for a given fixed size of VM. Produces the lowest possible page fault rate. Impossible to implement since it requires future knowledge of reference string. Just used to gauge the performance of real algorithms against best theoretical. 19
FIFO When a page fault occurs, replace the one that was brought in first Faults are shown in boxes; hits are not shown. 9 page faults occur
FIFO Simplest page replacement algorithm. Problem: can exhibit inconsistent behavior known as Belady’s anomaly. –Number of faults can increase if job is given more physical memory –i.e., not predictable 21
Example of FIFO Inconsistency Same reference string as before only with 4 frames instead of Faults are shown in boxes; hits are not shown. 10 page faults occur 22
LRU 23 Replace the page which has not been used for the longest period of time Faults are shown in boxes; hits only rearrange stack 9 page faults occur
LRU More expensive to implement than FIFO, but it is more consistent. Does not exhibit Belady’s anomaly More overhead needed since stack must be updated on each access. 24
Example of LRU Consistency Same reference string as before only with 4 frames instead of 3. Faults are shown in boxes; hits only rearrange stack 7 page faults occur 25
Servicing a Page Fault Processor Signals Controller –Read block of length P starting at disk address X and store starting at memory address Y Read Occurs –Direct Memory Access (DMA) –Under control of I/O controller I / O Controller Signals Completion –Interrupt processor –OS resumes suspended process disk Disk disk Disk Memory-I/O bus Processor Cache Memory I/O controller I/O controller Reg (2) DMA Transfer (1) Initiate Block Read (3) Read Done
Handling Page Faults Memory reference causes a fault – called a page fault Page fault can happen at any time and place –Instruction fetch –In the middle of an instruction execution System must save all state Move page from disk to memory Restart the faulting instruction –Restore state –Backup PC – not easy to find out by how much – need HW help
Page Fault If there is ever a reference to a page, first reference will trap to OS page fault 1. Hardware traps to kernel 2. General registers saved 3. OS determines which virtual page needed 4. OS checks validity of address, seeks page frame 5. If selected frame is dirty, write it to disk 6. OS brings schedules new page in from disk 7. Page tables updated 8. Faulting instruction backed up to when it began 9. Faulting process scheduled 10. Registers restored 11. Program continues
What to Page in Demand paging brings in the faulting page –To bring in additional pages, we need to know the future Users don’t really know the future, but some OSs have user-controlled pre-fetching In real systems, –load the initial page –Start running –Some systems (e.g. WinNT will bring in additional neighboring pages (clustering))
VM Page Replacement If there is an unused page, use it. If there are no pages available, select one (Policy?) and –If it is dirty (M == 1) write it to disk –Invalidate its PTE and TLB entry –Load in new page from disk –Update the PTE and TLB entry! –Restart the faulting instruction What is cost of replacing a page? How does the OS select the page to be evicted?
Measuring Demand Paging Performance Page Fault Rate (p) 0 < p < 1.0 (no page faults to every ref is a fault) Page Fault Overhead = fault service overhead + read page + restart process overhead –Dominated by time to read page in Effective Access Time = (1-p) (memory access) + p (page fault overhead)
Performance Example Memory access time = 100 nanoseconds Page fault overhead = 25 millisec (msec) Page fault rate = 1/1000 EAT = (1-p) * p * (25 msec) = (1-p) * p * 25,000,000 = ,999,900 * p = ,999,900 * 1/1000 = 25 microseconds! Want less than 10% degradation 110 > ,999,900 * p 10 > 24,999,900 * p p < or 1 fault in 2,500,000 accesses!
Page Replacement Algorithms Want lowest page-fault rate. Evaluate algorithm by running it on a particular string of memory references (reference string) and computing the number of page faults on that string. Reference string – ordered list of pages accessed as process executes Ex. Reference String is A B C A B D A D B C B
The Best Page to Replace The best page to replace is the one that will never be accessed again Optimal Algorithm - Belady’s Algorithm –Lowest fault rate for any reference string –Basically, replace the page that will not be used for the longest time in the future. –If you know the future, please see me after class!! –Belady’s Algorithm is a yardstick –We want to find close approximations
Page Replacement - FIFO FIFO is simple to implement –When page in, place page id on end of list –Evict page at head of list Might be good? Page to be evicted has been in memory the longest time But? –Maybe it is being used –We just don’t know FIFO suffers from Belady’s Anomaly – fault rate may increase when there is more physical memory!
FIFO vs. Optimal Reference string – ordered list of pages accessed as process executes Ex. Reference String is A B C A B D A D B C B OPTIMAL A B C A B D A D B C B toss A or D toss C 5 Faults FIFO A B C A B D A D B C B toss A ABCDABCABCDABC toss ? 7 faults System has 3 page frames
Second Chance Maintain FIFO page list On page fault –Check reference bit If R == 1 then move page to end of list and clear R If R == 0 then evict page
Clock Replacement Create circular list of PTEs in FIFO Order One-handed Clock – pointer starts at oldest page –Algorithm – FIFO, but check Reference bit If R == 1, set R = 0 and advance hand evict first page with R == 0 –Looks like a clock hand sweeping PTE entries –Fast, but worst case may take a lot of time Two-handed clock – add a 2 nd hand that is n PTEs ahead –2 nd hand clears Reference bit
Not Recently Used Page Replacement Algorithm Each page has Reference bit, Modified bit –bits are set when page is referenced, modified Pages are classified 1. not referenced, not modified 2. not referenced, modified 3. referenced, not modified 4. referenced, modified NRU removes page at random –from lowest numbered non empty class
Least Recently Used (LRU) Replace the page that has not been used for the longest time 3 Page Frames Reference String - A B C A B D A D B C A B C A B D A D B C LRU – 5 faults
LRU Past experience may indicate future behavior Perfect LRU requires some form of timestamp to be associated with a PTE on every memory reference !!! Counter implementation –Every page entry has a counter; every time page is referenced through this entry, copy the clock into the counter. –When a page needs to be changed, look at the counters to determine which are to change Stack implementation – keep a stack of page numbers in a double link form: –Page referenced: move it to the top –No search for replacement
LRU Approximations Aging –Keep a counter for each PTE –Periodically – check Reference bit If R == 0 increment counter (page has not been used) If R == 1 clear the counter (page has been used) Set R = 0 –Counter contains # of intervals since last access –Replace page with largest counter value Clock replacement
Contrast: Macintosh Memory Model MAC OS 1–9 –Does not use traditional virtual memory All program objects accessed through “handles” –Indirect reference through pointer table –Objects stored in shared global address space P1 Pointer Table P2 Pointer Table Process P1 Process P2 Shared Address Space A B C D E “Handles”
Macintosh Memory Management Allocation / Deallocation –Similar to free-list management of malloc/free Compaction –Can move any object and just update the (unique) pointer in pointer table “Handles” P1 Pointer Table P2 Pointer Table Process P1 Process P2 Shared Address Space A B C D E
Mac vs. VM-Based Memory Mgmt Allocating, deallocating, and moving memory: –can be accomplished by both techniques Block sizes: –Mac: variable-sized may be very small or very large –VM: fixed-size size is equal to one page (4KB on x86 Linux systems) Allocating contiguous chunks of memory: –Mac: contiguous allocation is required –VM: can map contiguous range of virtual addresses to disjoint ranges of physical addresses Protection –Mac: “wild write” by one process can corrupt another’s data
MAC OS X “Modern” Operating System –Virtual memory with protection –Preemptive multitasking Other versions of MAC OS require processes to voluntarily relinquish control Based on MACH OS –Developed at CMU in late 1980’s
Page Replacement Policy Working Set: –Set of pages used actively & heavily –Kept in memory to reduce Page Faults Set is found/maintained dynamically by OS Replacement: OS tries to predict which page would have least impact on the running program Common Replacement Schemes: Least Recently Used (LRU) First-In-First-Out (FIFO)
Page Replacement Policies Least Recently Used (LRU) –Generally works well –TROUBLE: When the working set is larger than the Main Memory Working Set = 9 pages Pages are executed in sequence (0 8 (repeat) ) THRASHING
Page Replacement Policies First-In-First-Out(FIFO) –Removes Least Recently Loaded page –Does not depend on Use –Determined by number of page faults seen by a page
Page Replacement Policies Upon Replacement –Need to know whether to write data back –Add a Dirty-Bit Dirty Bit = 0; Page is clean; No writing Dirty Bit = 1; Page is dirty; Write back