Presentation on theme: "Background Often, only part of a running program actually needs to be in memory. The ability to execute a program only partially in memory would provide."— Presentation transcript:
Background Often, only part of a running program actually needs to be in memory. The ability to execute a program only partially in memory would provide many benefits: – A program would not be constrained by the size of the physical memory. – More programs could run at the same time. – Programs could be loaded into memory more quickly. Virtual memory: separation of logical memory as perceived by users from physical memory Virtual address space: the logical (or virtual) view of how a program is stored in memory
Virtual Memory Diagram
Demand Paging Demand paging involves only loading pages as they are needed. When a process requests the contents of a logical memory address: – If the address is on a page in memory (marked as valid in the page table), the contents are retrieved. – Otherwise, the address is on a page not in memory (marked as invalid in the page table), and a page fault occurs. – If the address is in the process’s memory space, then summon the pager to bring the page into memory. – Otherwise, the address is not in the process’s memory space, then raise an error condition. Thus, demand paging uses a lazy pager, that only brings pages into memory when required.
Example Page Table
Page Fault Handling a page fault requires several steps: 1.Determine whether or not the process is allowed to access the requested memory location. If so, continue to step 2. Otherwise, terminate the process. 2.Find a free frame (e.g. take one from the free-frame list). 3.Schedule a disk operation to read the desired page into the newly allocated frame. 4.Modify the page table to indicate that the page is in memory. 5.Restart the process at the instruction that requested the memory location. Complications: Can an instruction always be restarted? What if a single instruction modifies many different memory locations?
Performance of Demand Paging Memory access time: 10 to 200 nanoseconds, denoted m Page fault time: 5 to 10 milliseconds, denoted f Let p be the probability of a page fault. The effective access time is: effective access time = (1 – p) × m + p × f Example: suppose m = 150 ns, f = 8,000,000 ns, p = 0.001 effective access time = (0.999) × 150 + 0.001 × 8,000,000 = 8150 ns Demand paging has slowed the system by a factor of 54 For good performance, we need a page fault rate of much less than 1 page fault in 1000 memory accesses. Of course, demand paging also saves us from loading into memory pages that are never needed, which improves performance.
Copy on Write Recall: the fork() system call creates a child process that is a duplicate of its parent Since the child might not modify its parents pages, we can employ the copy-on-write technique: – The child initially shares all pages with the parent. – If either process modifies a page, then a copy of that page is created.
Page Replacement If a process requests a new page and there are no free frames, the operating system must decide which page to replace. The OS must use a page-replacement algorithm to select a victim frame. The OS must then write the victim frame to disk, read the desired page into the frame, and update the page tables. This requires double the disk access time. To reduce page-replacement overhead, we can use a modify bit (or dirty bit) to indicate whether each page has been modified. If a page has not been modified, there is no need to write it back to disk when it is being replaced (it is already on disk).
Belady’s Anomaly With FIFO page replacement, the number of page faults can increase with more frames of memory! Expected Graph of Frames and Page Faults Graph for Reference String 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
Optimal Page Replacement Replace the page that will not be used for the longest period of time Guarantees the lowest possible page-fault rate Difficult to implement, since it requires knowledge of the future Example: 00 1 0 1 2 3 1 2 3 1 2 3 1 25 1 33 1 5 3 5 4 3 4 5 3 4 5 3 1 4 3 4 1 3 4 1 4 0 11 4 22 4 1 2 4 1 01232251434513402412 2 4 1 3 5 4 9 page faults for this reference string
LRU Page Replacement LRU: Least Recently Used Replace the page that has not been used for the longest period of time Not so easy to implement: must associate a time stamp to each page or maintain a stack Requires hardware assistance, since extra data must be maintained with every memory reference Example: 00 1 0 1 2 3 1 2 3 1 2 3 1 2 5 2 31 5 2 1 5 4 1 3 4 5 3 4 5 1 4 5 1 3 4 1 3 4 0 3 4 0 2 4 0 2 01232251434513402412 4 1 2 1 4 3 4 2 1 15 page faults for this reference string
LRU-Approximation Algorithms Many systems don’t provide hardware support for true LRU Some systems provide a reference bit for each page, which helps determine if the page was recently used Example: suppose the OS periodically sets all reference bits to 0, and when a page is accessed, its reference bit is set to 1 Second-Chance Algorithm: – Use a FIFO replacement algorithm to select a page – If the page’s reference bit is 1, give the page a second chance by setting its reference bit to 0 and selecting the next page Additional-Reference-Bits Algorithm – At regular intervals, record and then reset the reference bits – This establishes a rough ordering of how recently pages were used
Allocation of Frames How do we allocate the fixed amount of free memory among various processes? Each process needs some minimum number of available frames, depending on how many levels of indirect addressing are allowed. – Example: If a load instruction on page 0 refers to an address on page 1, which is an indirect reference to page 2, then the process must have at least 3 frames. Equal allocation: allocate the same number of frames per process – A small process could waste frames while a big process might not have enough Proportional allocation: allocate frames according to some criteria – Allocate more frames to a big process? – Allocate more frames to a high-priority process?
Allocation of Frames Is the set of frames allocated to a process fixed? – Global replacement: a process can select a replacement frame currently allocated to another process – Local replacement: a process can only select replacement frames from its own set of allocated frames Is all memory accessed at the same speed? – In some multiprocessor systems, a given CPU can access some sections of memory faster than other sections. – Such systems are non-uniform memory access (NUMA) systems – NUMA systems introduce further complications for scheduling and paging
Thrashing A process that spends more time paging than executing is thrashing. Thrashing occurs when a process doesn’t have enough frames: – The process has more frequently-used pages than frames. – The process experiences frequent page faults. – With each page fault, the process must replace some frequently- used page.
Preventing Thrashing Locality model – A process generally requires a set of pages that are used together, called a locality. – As a process runs, it moves from locality to locality. Working-set model – Use a parameter, ∆, to approximate locality – Working set: the pages in the most recent ∆ references – The OS monitors the working set for each process, ensuring that each process has enough frames for its working set. – This requires an appropriate choice of ∆. – Using the working-set model to prevent thrashing requires extra overhead from the OS.
Preventing Thrashing Page-fault frequency – A simple strategy to prevent thrashing is to monitor the page- fault frequency. – If page faults are too frequent, then the process needs more frames. – If a process has very few page faults, then has too many frames.
Memory-Mapped Files A memory-mapped file allows file I/O to be treated as routine memory access by mapping a disk block to a page in memory. Mechanism: – A file is initially read using demand paging. – A page-sized portion of the file is read from the file system into a physical page. – Subsequent reads/writes to/from the file are treated as ordinary memory accesses. – If the file is modified, modified pages are eventually (or periodically) copied back to the disk. This simplifies file access by treating file I/O through memory rather than read() and write() system calls. It also allows several processes to map the same file, allowing the pages in memory to be shared.
Allocating Kernel Memory Memory for kernel is often allocated from a different pool than that used for user processes. – Kernel requests memory for data structures of varying sizes, which may be much smaller than a page and which might not be paged – Certain hardware devices interact directly with physical memory, requiring contiguous blocks of memory (rather than pages) Kernel memory could be allocated by a power-of-2 allocator, which rounds memory requests up to the next power of 2. Kernel memory could also be allocated by a slab allocator, which creates a cache in memory for each different type of kernel data structure.
Other Considerations for Paging Prepaging: – Attempts to reduce the large number of page faults that occur when a process starts up – Brings some pages into memory before they are needed – Could be wasteful if pages aren’t actually needed Page size: – The particular choice of page size affects system performance – Considerations: fragmentation, table size, I/O overhead, locality TLB reach: the amount of memory accessible from the TLB – TLB reach is TLB size times page size – Ideally, the TLB would store the working set for a process
Other Considerations for Paging Program structure: – A program can be written and compiled in a way that increases locality and reduces page faults I/O interlock: – Suppose we instruct an I/O device to write to a certain page in memory. We don’t want that page to be replaced before the I/O operation is completed. – Some systems allow pages to be locked in memory. – Pages waiting for I/O can be locked to particular frames until the I/O operation completes.