2BACKGROUNDThe above discussed memory management methods tend to require the entire process to be in memory before the process can executeIn many cases, the entire program is not neededPrograms often have code to handle unusual error conditions.Arrays, lists, and tables are often allocated more memory than they actually need.Center options and features of a program may be used rarely.Even in those cases where the entire program is needed, it may not all be needed at the same time (such is the case with overlays, for example)
3BackgroundThe ability to execute a program that is only partially in memory would have many benefits:Allows processes to use extremely large virtual address space, simplifying the programming task.Allows more processes to be able to run.Less I/O would be needed to load or swap each user program into memory. virtual memory.
4BackgroundVirtual memory management (虚拟内存管理) is a term used to describe a technique whereby the computer appears to have much more memory than it actually does. (Virtual means being in effect but not in fact)An analogyYou run a train as long as a railway is.You can run a train across the country by pulling up the rails behind it and laying them down again in front of it as it rolls. The faster you can relay the track, the better. It only needs to have rails directly underneath.
5BackgroundVirtual Memory That is Larger Than Physical Memory
6BackgroundVirtual memory – separation of user logical memory from physical memory.Only part of the program needs to be in memory for execution.Logical address space can therefore be much larger than physical address space.Allows address spaces to be shared by several processes.Allows for more efficient process creation.
7Background Virtual memory is not easy to implement. Virtual memory implementation needs hardware support.Virtual memory can be implemented via:Demand paging.Demand segmentation.
8DEMAND PAGINGA demand-paging is similar to a paging system with swapping.When we want to execute a process, we swap it into memory.We use a lazy swapper. A lazy swapper never swaps a page into memory unless that page will be needed.Swapper or pager?
9Demand Paging: Basic Concept Transfer of a Paged Memory to Contiguous Disk Space
10Demand Paging: Basic Concept Bring a page into memory only when it is needed.Less I/O neededLess memory neededFaster responseMore usersPage is needed reference to itinvalid reference abortnot-in-memory bring to memory
11Demand Paging: Basic Concept With each page table entry a valid–invalid bit is associated1 legal and in-memory,0 illegal or not in-memoryInitially valid–invalid is set to 0 on all entries.During address translation, if valid–invalid bit in page table entry is 0 page fault.The process will keep page faulting until all the pages it needs is in memory.Example of a page table snapshot. =>see next
13Demand Paging: Basic Concept Steps in Handling a Page Fault
14Demand Paging: Basic Concept HardwarePage table: This table has the ability to mark an entry invalid through a valid-invalid bit or special value of protection bits.Secondary memory: This memory holds those pages that are not present in main memory. The secondary memory is usually a high-speed disk.SoftwarePage replacement algorithmsFrame allocation algorithms
15Demand Paging: Basic Concept If there is ever a reference to a page, first reference will trap to OS page faultOS looks at another table to decide:Invalid reference abort.Just not in memory.Get empty frame.Swap page into frame.Reset tables, validation bit = 1.Restart instruction:
16Demand Paging: Basic Concept A crucial architectural constraint is the need to restart any instruction after a page faultIf the page fault occurs on the instruction fetch, we can restart by fetching the instruction again.If a page fault occurs while we are fetching an operand, we must fetch and decode the instruction again, and then fetch the operand.Add A, B, CFetch and decode the instruction (Add)Fetch AFetch BAdd A and BStore the sum in C
17Demand Paging: Basic Concept MVC (IBM System 360/370)Moves up to 256 bytes from one location to another (possibly overlapping) location.If either source or destination straddles a page boundary, a page fault might occur after the move is partially done. In addition, if the source and destination blocks overlap, the source black may have been modified, in which case we cannot simply restart the instruction.Solution 1: the microcode computes and attempts to access both ends of both blocks.Solution 2: to use temporary registers to hold the values of overwritten locations.
18Demand Paging: Basic Concept Autodecrement and autoincrement modes (PDP-11)Autodecrement automatically decrements the register before using its contents as the operand address;Autoincrement automatically increments the register after using its contents as the operand address.MOV (R2)+, -(R3)What will happen if we get a page fault when trying to store into the location pointed to by register3?Possible solution: to use a status register to indicate how much has been done.
19Demand Paging: Basic Concept Page fault frequencyIn theory, multiple page faults per instructionIn practice, programs tend to have locality of reference, which results in reasonable performance from demand paging
20Demand Paging: Performance Demand paging can have a significant effect on the performance of a computer system.To compute the effective access time for a demand paged memoryAs long as there is no page fault, the effective access time is equal to the memory access time. (about 10 to 200 nanoseconds).If there is any page fault, we must first read the relevant page from disk, and then access the desired word. the effective access time could be very large!
21Demand Paging: Performance Let p be the probability of a page fault (0 <= p <= 1), then Effective Access Time (EAT)EAT = (1 – p) (memory access)+ p (page fault time)if p = 0, no page faults.if p = 1, every reference is a fault.Most of time, p is very close to 0.
22Demand Paging: Performance To compute the EAT, we must know how much time is needed to service a page fault. A page fault causes the following sequence to occur:Trap to the OS.Save the user registers and process state.Determine that the interrupt was a page fault.Check that the page reference was legal and determine the location of the page on the disk.Issue a read (from the disk to a free frame):Wait in a queue for this device until the read request is serviced.Wait for the device seek time and latency time.Begin the transfer of the page to a free frame.
23Demand Paging: Performance While waiting, allocate the CPU to some other user (CPU scheduling, optional).Interrupt from the disk (I/O completed).Save the registers and process state for the other user (if step 6 is executed).Determine that the interrupt was from the disk.Correct the page table and other tables to show that the desired page is now in memory.Wait for the CPU to be allocated to this process again.Restore the user registers, process state, and new page table, then resume the interrupt instruction.
24Demand Paging: Performance Three major components of the page-fault service timeService the page-fault interruptRead in the pageRestart the processEffective Access Time (EAT)EAT = (1 – p) x memory access+ p ( Service the page-fault interrupt+ READ IN THE PAGE+ Restart the process)Seek time (15ms) + latency time (8ms) + transfer time (1ms) 25ms
25Demand Paging: Performance Assume an average page-fault service time of 25 millisecond and a memory access time of 100 nanoseconds, then the EAT in nanoseconds isEAT = (1-p) 100 ns + p (25ms)= (1-p) 100 ns + p (25,000,000 ns)= ,999,900 p (ns)When p = 0.001=1/1000, EAT = 25ms The computer would be slowed down by a factor of 250 because of paging.When p = = 1/ , EAT = 110ns. The degradation is less than 10%.
26Demand Paging: Performance How to reduce the EAT:To reduce the page fault rate as much as possible.Other options:Option 1: Swap space is faster. So the OS just copies an entire file image into the page space at process startup and then performing paging from the swap space.Option 2: to demand pages from the file system initially, but to write the pages to swap space as they are replaced.Option 3: to demand pages directly from the file system and to use swap space for process stack and heap.
27PROCESS CREATIONVirtual memory allows other benefits during process creation:- Copy-on-Write- Memory-Mapped Files
28Process Creation: Copy-on-Write fork()To create a copy of the parent’s address space for the child, duplicating the pages belonging to the parent.The copying of the parent’s address space may be unnecessary for many child processes invoke the exec() system call immediately after creation.vfork()With vfork() the parent process is suspended and the child process uses the address space of the parent.If the child process changes any pages of the parent’s address space, the altered pages will be visible to the parent once it resumes.very efficient, dangerous, to be used when the child process calls exec() immediately after creation.
29Process Creation: Copy-on-Write fork() with COWCopy-on-Write (COW) allows both parent and child processes to initially share the same pages in memory.These shared pages are marked as copy-on-write pages, meaning that if either process writes to a shared page, a copy of the shared page is created.Only the pages modified by either process are copied; all non-modified pages may be shared by the parent and child process.Free pages are allocated from a pool of zeroed-out pages.fork() with COW allows more efficient process creation as only modified pages are copied.fork() with COW is safer than vfork().
30Process Creation: Memory-Mapped Files Standard file I/O operations require disk access directly.Memory-mapped file I/O allows file I/O to be treated as routine memory access by mapping a disk block to a page in memory.A file is initially read using demand paging. A page-sized portion of the file is read from the file system into a physical page. Subsequent reads/writes to/from the file are treated as ordinary memory accesses.Simplifies file access by treating file I/O through memory rather than read(), write() system calls.The mapped memory acts as a cache for a file
31Process Creation: Memory-Mapped Files OS support for memory-mapped filesTo provide memory mapping only through a specific system call and treat all other file I/O using the standard system callsTo treat all file I/O as memory-mapped, allowing file access to take place in memory.Also allows several processes to map the same file allowing the pages in memory to be shared.
33PAGE REPLACEMENT Does each page faults at most once? Suppose we have 40 frames. A process of 10 pages actually uses only 5 of them. we could run 8 processes rather than 4 processes.If we increase our degree of multiprogramming, we are over-allocating memoryIf we run 6 processes, higher CPU utilization and throughput and 10 frames to spare.Suppose they suddenly try to use all 10 of its pages, needs 60 framesWhat to do?
34Page Replacement I/O Buffers Buffers for I/O consumes a significant amount of memory. This use can increase the strain on memory-placement algorithmDeciding how much memory to allocate to I/O and how much to program pages is a significant challenge.Some systems allocate a fixed percentage of memory for I/O buffers, whereas others allow both user processes and the I/O subsystem to compete for all system memory.Over-allocation could cause one process can not find the frame for its missing pages.
36Page Replacement: Options The options for the OSThe OS could terminate the user processThe OS could swap out a process, freeing all its frames, and reducing the level of multiprogramming.The OS could perform page replacement.
37Page Replacement: Basic Scheme The Page-fault service routine with page replacementFind the location of the desired page on disk.Find a free frame:If there is a free frame, use it.If there is no free frame, use a page replacement algorithm to select a victim frame. Write the victim page to the disk; change the page and frame tables accordingly;Read the desired page into the (newly) free frame. Update the page and frame tables.Restart the process.
39Page Replacement: Basic Scheme With page replacement, the page-fault service time is doubled.To use modify bit (or dirty bit) to reduce this overhead. Each page or frame may have a modify bit associated with it in the hardware. The modify bit for a page is set by the hardware whenever any word or byte in the page is written into, indicating that the page has been modified.If modified, page out and page inIf clean, just page in the page-fault service is reasonable for only modified pages are written to disk.
40Page Replacement: Basic Scheme The algorithms for page replacementThe lowest page-fault rateHow to evaluate page replacementEvaluate it on a particular string of memory references and computing the number of page faultsSimulationImplementation
41Page Replacement: Basic Scheme How to select a reference string (real recorded data or simulated data)How to reduce the number of dataJust consider the page number only, rather than the entire address.If we have a reference to a page p, then any immediately following references to page p will never cause a page fault.An example: 0100, 0432,0101,0612, 0102,0103,0104,0611 1,4,1,6,1,6
42Page Replacement: Basic Scheme Graph of Page Faults Versus The Number of Frames
43Page Replacement: FIFO FIFO Page ReplacementTo associate with each page the time when that page was brought into memoryTo replace the oldest page if necessaryImplementationA FIFO queue to hold all pages in memory.To replace the page the head of queue.To append at the tail when a page is brought into memory.FeaturesEasy to understand and easy to implementIts performance is not always good.
46Page Replacement: OPT An optimal page-replacement algorithm has the lowest page-fault rate of all algorithms, andwill never suffer from Belady’s anomaly.The optimal page-replacement algorithmReplace the page that will not be used for the longest period of time.替换最晚才用的页。FeaturesLowest page fault rate(No Belady’s anomaly)Difficult to implement.Can be used for comparison.
48Page Replacement: LRU FIFO, OPT, LRU FIFO, looking at the past OPT, looking at the futureLRU, looking at the past, predicting the future.LRU: least-recently usedTo associate with each page the time that page’s last useTo replace the page that has not been used for the longest period of time.
50Page Replacement: LRU How to implement the LRU Counters Add to the CPU a logical clock or counterAssociate with each entry a TimeOfUse fieldWhenever a reference to page is the made, the contents of the clock register are copied to the TimeOfUse fieldTo replace the page with the smallest time value.StackTo keep a stack of page numbersWhenever a page is referenced, it is moved to the topTo replace the page at the bottom of the stack which is the LRU page.
52Page Replacement: LRU Approximation Many systems provide some help in the form of a reference bit.A reference bit is associated with each entry in the page table.The reference bit for a page is set by the hardware when that page is referenced.To replace the page which is not referenced.
53Page Replacement: LRU Approximation Additional-reference-bits algorithmTo keep an 8-bit byte for each page in a table in memoryAt regular intervals (every 100 ms), a timer interrupt transfers control to the OS. The OS shifts the reference bit for each page into the high-order bit of its 8-bit byte, shifting the other bits right 1 bit, discarding the low-order bit. These 8-bit bytes contain the history of the page use for the last eight time periods.If we interpret these 8-bit bytes as unsigned integers, the page with the lowest number is the LRU page and it can be replaced.
54Page Replacement: LRU Approximation Second-chance algorithmWhen a page has been inspected, we check its reference bitIf the value is 0, replace itIf the value is 1, give it a second chance and try next. Furthermore, set it to 0To implement the second-chance algorithmA circular queue of pages.
56Page Replacement: LRU Approximation Enhanced second-chance algorithm(Reference bit, modified bit)(0,0): best page to replace(0,1): not quite good for replacement(1,0): will be used soon(1,1): worst page to replace.Macintosh VMM.
57Page Replacement: Counting-Based Page Replacement Keep a counter of the number of references that have been made to each page.LFU (Least Frequently Used ) Algorithm: replaces page with smallest count.MFU (Most Frequently Used ) Algorithm: based on the argument that the page with the smallest count was probably just brought in and has yet to be used.
58Page Replacement: Page-Buffering Algorithm To reduce the number of modified pages as many as possibleTo maintain a list of modified pagesWhen the paging device is idle, a modified page is selected and is written to the disk and its modify bit is then reset. to increase the probability that a page will be clean when it is selected for replacement and will not need to be written out.
59Page Replacement: Page-Buffering Algorithm To keep a pool of free framesWhen a page fault occurs, a victim frame is chosen as before.The desired page is read into a free frame from the pool before the victim is written out.To allow the process to restart ASAP.
60Page Replacement: Page-Buffering Algorithm To keep a pool of free frames and to remember which page was in each frameIf a page is needed again, check the pool.Maybe the frame containing the original page is not used by someone else. So use this.Caching and Buffering are very, very important techniques.
61Page Replacement: Global vs. Local Allocation Global replacementA process selects a replacement frame from the set of all frames; one process can take a frame from another.The number of frames allocated to a process changes.Affected by external circumstancesLocal replacementEach process selects from only its own set of allocated frames.The number of frames allocated to a process does not changeNot affected by external circumstancesIn general, global replacement is better.
62FRAME ALLOCATION For uniprogramming OS Consider a single-user system with 128 KB memory with page size as 1KBSuppose the OS takes 35KB, 93 frames are left from the user process.Have no free-frame list poolHave a free-frame list pool.The user process is allocated any free frame.For multiprogramming OS?
63Frame Allocation For multiprogramming OS What is the minimum number of frames for a process?What is the maximum number of frames for a process?How to split the free frames among the processes?
64Frame Allocation Minimum number of frames for a process Defined by the architectureWe must have enough frames to hold all the different pages that any single instruction can reference.A machine in which all memory-reference instructions have only one memory address.2 + 1 (if indirected)The IBM 370’s MVCInstruction: 6 bytes 2 pagesSource addresses and destination addresses 4 pagesSome machines that allow multiple levels of indirection.Maximum number of frames for a process?Defined by the amount of available physical memory
65Frame Allocation: Fixed Allocation Equal allocation – e.g., if 100 frames and 5 processes, give each 20 pages.Proportional allocation – Allocate according to the size of process.
66Frame Allocation: Priority Allocation Use a proportional allocation scheme using a function of priorities and sizes.If process Pi generates a page fault,select for replacement one of its frames.select for replacement a frame from a process with lower priority number.
67Frame Allocation: Global versus Local Allocation Global replacement:To allow a process to select a replacement frame from the set of all frames, even if that frame is currently allocated to some other process.Can increase the number of frames allocated.Cannot control its page-fault rate.Greater system throughput.Local replacement:To allow a process to select from only its own set of allocated frames.Cannot increase the number of frames allocated.Canot control its page-fault rate.
68THRASHINGA process is thrashing if it is spending more time paging and executing. (A student is thrashing sometimes as well (going to the Libs))CauseSolutionsWorking-set modelPage-fault frequency
69Thrashing: Cause Cause of thrashing While (CPU utilization: too low) increase the degree of multiprogrammingProcesses will take frames away from other processes (by global replacement)processes wait for the paging deviceThrashing
71Thrashing: LocalityThe locality model states that, as a process executes, it moves locality to locality.A locality is a set of pages that are actively used together.A program is generally composed of several different localities, which may overlap.If a process has enough frames to contain its localities, then it run smoothly.
72Thrashing: LocalityLocality In A Memory-Reference Pattern
73Thrashing: Locality How to prevent thrashing To allocate enough frames to a process to accommodate its current locality. It will fault for the pages in its locality until all these pages are in memory; then it will not fault again until it changes localities.If we allocate fewer frames that the size of the current locality, the process will thrash, since it cannot keep in memory all the pages that it is actively using.Two approachesWorking-set modelPage-fault frequency
74Thrashing: Working-Set Model Working-set window, working setWorking-set window is a fixed number of page referencesWorking set is the set of pages in the most recent page references.If a page is in active use, it will be in the working set.If it is no longer being used, it will drop from the working set time units after its last reference. The working set is an approximation of the program’s locality.
76Thrashing: Working-Set Model An approximation of the program’s localityThe accuracy of the working set depends on the selection of if too small will not encompass entire locality.if too large will encompass several localities.if = will encompass entire program.Changes dynamically
77Thrashing: Working-Set Model How does the working-set model works?To compute the working-set size for each process in the system,To compute the total demand for frames.If the total demand is less than the total availabe frames, no thrashing will occur.If the total demand is greater than the total available frames, thrashing will occur.
78Thrashing: Working-Set Model If one process doesn’t thrash, it should have enough frames for its working set.If all process don’t thrash, the OS should have enough frames for total working sets.If thrash ever happens, the OS swaps some processes out to reduce the degree of multiprogramming.How to keep track of the working set.
79Thrashing: Working-Set Model Approximate with interval timer + a reference bitExample: = 10,000Timer interrupts after every 5000 time units.Keep in memory 2 bits for each page.Whenever a timer interrupts, copy and clear the values of all reference bits to 0.If one of the bits in memory = 1 its page is in working set. not completely accurate?Improvement10 bits for each pageinterrupt every 1000 time units.
80Thrashing: Page-Fault Frequency Scheme The working-set model is successful, knowledge of the working-set can be useful for prepaging, but it seems a clumsy way to control thrashing.Establish “acceptable” page-fault rate.If actual rate too low, process loses frame. (could be suspended)If actual rate too high, process gains frame.
83Other Considerations: Prepaging Pure demand paging the large number of page faults that occur when a process is started.Prepaging: to prevent this high level of initial paging.Prepaging: to try to bring into memory at one time all the pages that will be needed.To bring in the saved working setDiscussion: whether the cost of using prepaging is less than the cost of servicing the corresponding page faults.
84Other Considerations: Page size Page size selectionPage table sizeInternal fragmentationI/O overhead (seek time, latency time, transfer time)LocalityPage fault rate The trend is toward larger and larger page size.
85Other Considerations: TLB reach TLB Reach - The amount of memory accessible from the TLB.TLB Reach = (TLB Size) X (Page Size)Ideally, the working set of each process is stored in the TLB. Otherwise there is a high degree of page faults.
86Other Considerations: TLB Reach Increase the Page Size. This may lead to an increase in fragmentation as not all applications require a large page size.Provide Multiple Page Sizes. This allows applications that require larger page sizes the opportunity to use them without an increase in fragmentation.
87Other Considerations: Program structure int A = new int;Each row is stored in one page.Program 1: 1024 x 1024 page faultsfor (j = 0; j < A.length; j++) for (i = 0; i < A.length; i++) A[i,j] = 0;Program 2: 1024 page faultsfor (i = 0; i < A.length; i++) for (j = 0; j < A.length; j++) A[i,j] = 0;
88Other Considerations: I/O Interlock I/O Interlock – Pages must sometimes be locked into memory.Consider I/O. Pages that are used for copying a file from a device must be locked from being selected for eviction by a page replacement algorithm.
89Other Considerations: I/O Interlock Reason Why Frames Used For I/O Must Be In Memory