CHAPTER 10: VIRTUAL MEMORY

CHAPTER 10: VIRTUAL MEMORY
Background Demand Paging Process Creation Page Replacement Frame Allocation Thrashing Operating System Examples

BACKGROUND The above discussed memory management methods tend to require the entire process to be in memory before the process can execute In many cases, the entire program is not needed Programs often have code to handle unusual error conditions. Arrays, lists, and tables are often allocated more memory than they actually need. Center options and features of a program may be used rarely. Even in those cases where the entire program is needed, it may not all be needed at the same time (such is the case with overlays, for example)

Background The ability to execute a program that is only partially in memory would have many benefits: Allows processes to use extremely large virtual address space, simplifying the programming task. Allows more processes to be able to run. Less I/O would be needed to load or swap each user program into memory.  virtual memory.

Background Virtual memory management (虚拟内存管理) is a term used to describe a technique whereby the computer appears to have much more memory than it actually does. (Virtual means being in effect but not in fact) An analogy You run a train as long as a railway is. You can run a train across the country by pulling up the rails behind it and laying them down again in front of it as it rolls. The faster you can relay the track, the better. It only needs to have rails directly underneath.

Background Virtual Memory That is Larger Than Physical Memory

Background Virtual memory – separation of user logical memory from physical memory. Only part of the program needs to be in memory for execution. Logical address space can therefore be much larger than physical address space. Allows address spaces to be shared by several processes. Allows for more efficient process creation.

Background Virtual memory is not easy to implement.
Virtual memory implementation needs hardware support. Virtual memory can be implemented via: Demand paging. Demand segmentation.

DEMAND PAGING A demand-paging is similar to a paging system with swapping. When we want to execute a process, we swap it into memory. We use a lazy swapper. A lazy swapper never swaps a page into memory unless that page will be needed. Swapper or pager?

Demand Paging: Basic Concept
Transfer of a Paged Memory to Contiguous Disk Space

Bring a page into memory only when it is needed. Less I/O needed Less memory needed Faster response More users Page is needed  reference to it invalid reference  abort not-in-memory  bring to memory

With each page table entry a valid–invalid bit is associated 1  legal and in-memory, 0  illegal or not in-memory Initially valid–invalid is set to 0 on all entries. During address translation, if valid–invalid bit in page table entry is 0  page fault. The process will keep page faulting until all the pages it needs is in memory. Example of a page table snapshot. =>see next

Steps in Handling a Page Fault

Hardware Page table: This table has the ability to mark an entry invalid through a valid-invalid bit or special value of protection bits. Secondary memory: This memory holds those pages that are not present in main memory. The secondary memory is usually a high-speed disk. Software Page replacement algorithms Frame allocation algorithms

If there is ever a reference to a page, first reference will trap to OS  page fault OS looks at another table to decide: Invalid reference  abort. Just not in memory. Get empty frame. Swap page into frame. Reset tables, validation bit = 1. Restart instruction:

A crucial architectural constraint is the need to restart any instruction after a page fault If the page fault occurs on the instruction fetch, we can restart by fetching the instruction again. If a page fault occurs while we are fetching an operand, we must fetch and decode the instruction again, and then fetch the operand. Add A, B, C Fetch and decode the instruction (Add) Fetch A Fetch B Add A and B Store the sum in C

MVC (IBM System 360/370) Moves up to 256 bytes from one location to another (possibly overlapping) location. If either source or destination straddles a page boundary, a page fault might occur after the move is partially done. In addition, if the source and destination blocks overlap, the source black may have been modified, in which case we cannot simply restart the instruction. Solution 1: the microcode computes and attempts to access both ends of both blocks. Solution 2: to use temporary registers to hold the values of overwritten locations.

Autodecrement and autoincrement modes (PDP-11) Autodecrement automatically decrements the register before using its contents as the operand address; Autoincrement automatically increments the register after using its contents as the operand address. MOV (R2)+, -(R3) What will happen if we get a page fault when trying to store into the location pointed to by register3? Possible solution: to use a status register to indicate how much has been done.

Page fault frequency In theory, multiple page faults per instruction In practice, programs tend to have locality of reference, which results in reasonable performance from demand paging

Demand Paging: Performance
Demand paging can have a significant effect on the performance of a computer system. To compute the effective access time for a demand paged memory As long as there is no page fault, the effective access time is equal to the memory access time. (about 10 to 200 nanoseconds). If there is any page fault, we must first read the relevant page from disk, and then access the desired word.  the effective access time could be very large!

Let p be the probability of a page fault (0 <= p <= 1), then Effective Access Time (EAT) EAT = (1 – p) (memory access) + p (page fault time) if p = 0, no page faults. if p = 1, every reference is a fault. Most of time, p is very close to 0.

To compute the EAT, we must know how much time is needed to service a page fault. A page fault causes the following sequence to occur: Trap to the OS. Save the user registers and process state. Determine that the interrupt was a page fault. Check that the page reference was legal and determine the location of the page on the disk. Issue a read (from the disk to a free frame): Wait in a queue for this device until the read request is serviced. Wait for the device seek time and latency time. Begin the transfer of the page to a free frame.

While waiting, allocate the CPU to some other user (CPU scheduling, optional). Interrupt from the disk (I/O completed). Save the registers and process state for the other user (if step 6 is executed). Determine that the interrupt was from the disk. Correct the page table and other tables to show that the desired page is now in memory. Wait for the CPU to be allocated to this process again. Restore the user registers, process state, and new page table, then resume the interrupt instruction.

Three major components of the page-fault service time Service the page-fault interrupt Read in the page Restart the process Effective Access Time (EAT) EAT = (1 – p) x memory access + p ( Service the page-fault interrupt + READ IN THE PAGE + Restart the process) Seek time (15ms) + latency time (8ms) + transfer time (1ms)  25ms

Assume an average page-fault service time of 25 millisecond and a memory access time of 100 nanoseconds, then the EAT in nanoseconds is EAT = (1-p) 100 ns + p (25ms) = (1-p) 100 ns + p (25,000,000 ns) = ,999,900 p (ns) When p = 0.001=1/1000, EAT = 25ms  The computer would be slowed down by a factor of 250 because of paging. When p = = 1/ , EAT = 110ns.  The degradation is less than 10%.

How to reduce the EAT: To reduce the page fault rate as much as possible. Other options: Option 1: Swap space is faster. So the OS just copies an entire file image into the page space at process startup and then performing paging from the swap space. Option 2: to demand pages from the file system initially, but to write the pages to swap space as they are replaced. Option 3: to demand pages directly from the file system and to use swap space for process stack and heap.

PROCESS CREATION Virtual memory allows other benefits during process creation: - Copy-on-Write - Memory-Mapped Files

Process Creation: Copy-on-Write
fork() To create a copy of the parent’s address space for the child, duplicating the pages belonging to the parent. The copying of the parent’s address space may be unnecessary for many child processes invoke the exec() system call immediately after creation. vfork() With vfork() the parent process is suspended and the child process uses the address space of the parent. If the child process changes any pages of the parent’s address space, the altered pages will be visible to the parent once it resumes. very efficient, dangerous,  to be used when the child process calls exec() immediately after creation.

Process Creation: Copy-on-Write
fork() with COW Copy-on-Write (COW) allows both parent and child processes to initially share the same pages in memory. These shared pages are marked as copy-on-write pages, meaning that if either process writes to a shared page, a copy of the shared page is created. Only the pages modified by either process are copied; all non-modified pages may be shared by the parent and child process. Free pages are allocated from a pool of zeroed-out pages. fork() with COW allows more efficient process creation as only modified pages are copied. fork() with COW is safer than vfork().

Process Creation: Memory-Mapped Files
Standard file I/O operations require disk access directly. Memory-mapped file I/O allows file I/O to be treated as routine memory access by mapping a disk block to a page in memory. A file is initially read using demand paging. A page-sized portion of the file is read from the file system into a physical page. Subsequent reads/writes to/from the file are treated as ordinary memory accesses. Simplifies file access by treating file I/O through memory rather than read(), write() system calls. The mapped memory acts as a cache for a file

OS support for memory-mapped files To provide memory mapping only through a specific system call and treat all other file I/O using the standard system calls To treat all file I/O as memory-mapped, allowing file access to take place in memory. Also allows several processes to map the same file allowing the pages in memory to be shared.

PAGE REPLACEMENT Does each page faults at most once?
Suppose we have 40 frames. A process of 10 pages actually uses only 5 of them.  we could run 8 processes rather than 4 processes. If we increase our degree of multiprogramming, we are over-allocating memory If we run 6 processes, higher CPU utilization and throughput and 10 frames to spare. Suppose they suddenly try to use all 10 of its pages,  needs 60 frames What to do?

Page Replacement I/O Buffers
Buffers for I/O consumes a significant amount of memory. This use can increase the strain on memory-placement algorithm Deciding how much memory to allocate to I/O and how much to program pages is a significant challenge. Some systems allocate a fixed percentage of memory for I/O buffers, whereas others allow both user processes and the I/O subsystem to compete for all system memory. Over-allocation could cause one process can not find the frame for its missing pages.

Page Replacement: The Need

Page Replacement: Options
The options for the OS The OS could terminate the user process The OS could swap out a process, freeing all its frames, and reducing the level of multiprogramming. The OS could perform page replacement.

Page Replacement: Basic Scheme
The Page-fault service routine with page replacement Find the location of the desired page on disk. Find a free frame: If there is a free frame, use it. If there is no free frame, use a page replacement algorithm to select a victim frame. Write the victim page to the disk; change the page and frame tables accordingly; Read the desired page into the (newly) free frame. Update the page and frame tables. Restart the process.

With page replacement, the page-fault service time is doubled. To use modify bit (or dirty bit) to reduce this overhead. Each page or frame may have a modify bit associated with it in the hardware. The modify bit for a page is set by the hardware whenever any word or byte in the page is written into, indicating that the page has been modified. If modified, page out and page in If clean, just page in  the page-fault service is reasonable for only modified pages are written to disk.

The algorithms for page replacement The lowest page-fault rate How to evaluate page replacement Evaluate it on a particular string of memory references and computing the number of page faults Simulation Implementation

How to select a reference string (real recorded data or simulated data) How to reduce the number of data Just consider the page number only, rather than the entire address. If we have a reference to a page p, then any immediately following references to page p will never cause a page fault. An example: 0100, 0432,0101,0612, 0102,0103,0104,0611  1,4,1,6,1,6

Graph of Page Faults Versus The Number of Frames

Page Replacement: FIFO
FIFO Page Replacement To associate with each page the time when that page was brought into memory To replace the oldest page if necessary Implementation A FIFO queue to hold all pages in memory. To replace the page the head of queue. To append at the tail when a page is brought into memory. Features Easy to understand and easy to implement Its performance is not always good.

FIFO Illustrating Belady’s Anomaly

Page Replacement: OPT An optimal page-replacement algorithm
has the lowest page-fault rate of all algorithms, and will never suffer from Belady’s anomaly. The optimal page-replacement algorithm Replace the page that will not be used for the longest period of time. 替换最晚才用的页。 Features Lowest page fault rate (No Belady’s anomaly) Difficult to implement. Can be used for comparison.

Page Replacement: OPT

Page Replacement: LRU FIFO, OPT, LRU FIFO, looking at the past
OPT, looking at the future LRU, looking at the past, predicting the future. LRU: least-recently used To associate with each page the time that page’s last use To replace the page that has not been used for the longest period of time.

Page Replacement: LRU

Page Replacement: LRU How to implement the LRU Counters
Add to the CPU a logical clock or counter Associate with each entry a TimeOfUse field Whenever a reference to page is the made, the contents of the clock register are copied to the TimeOfUse field To replace the page with the smallest time value. Stack To keep a stack of page numbers Whenever a page is referenced, it is moved to the top To replace the page at the bottom of the stack which is the LRU page.

Page Replacement: LRU

Page Replacement: LRU Approximation
Many systems provide some help in the form of a reference bit. A reference bit is associated with each entry in the page table. The reference bit for a page is set by the hardware when that page is referenced. To replace the page which is not referenced.

Additional-reference-bits algorithm To keep an 8-bit byte for each page in a table in memory At regular intervals (every 100 ms), a timer interrupt transfers control to the OS. The OS shifts the reference bit for each page into the high-order bit of its 8-bit byte, shifting the other bits right 1 bit, discarding the low-order bit. These 8-bit bytes contain the history of the page use for the last eight time periods. If we interpret these 8-bit bytes as unsigned integers, the page with the lowest number is the LRU page and it can be replaced.

Second-chance algorithm When a page has been inspected, we check its reference bit If the value is 0, replace it If the value is 1, give it a second chance and try next. Furthermore, set it to 0 To implement the second-chance algorithm A circular queue of pages.

Enhanced second-chance algorithm (Reference bit, modified bit) (0,0): best page to replace (0,1): not quite good for replacement (1,0): will be used soon (1,1): worst page to replace. Macintosh VMM.

Page Replacement: Counting-Based Page Replacement
Keep a counter of the number of references that have been made to each page. LFU (Least Frequently Used ) Algorithm: replaces page with smallest count. MFU (Most Frequently Used ) Algorithm: based on the argument that the page with the smallest count was probably just brought in and has yet to be used.

Page Replacement: Page-Buffering Algorithm
To reduce the number of modified pages as many as possible To maintain a list of modified pages When the paging device is idle, a modified page is selected and is written to the disk and its modify bit is then reset.  to increase the probability that a page will be clean when it is selected for replacement and will not need to be written out.

To keep a pool of free frames When a page fault occurs, a victim frame is chosen as before. The desired page is read into a free frame from the pool before the victim is written out. To allow the process to restart ASAP.

To keep a pool of free frames and to remember which page was in each frame If a page is needed again, check the pool. Maybe the frame containing the original page is not used by someone else. So use this. Caching and Buffering are very, very important techniques.

Page Replacement: Global vs. Local Allocation
Global replacement A process selects a replacement frame from the set of all frames; one process can take a frame from another. The number of frames allocated to a process changes. Affected by external circumstances Local replacement Each process selects from only its own set of allocated frames. The number of frames allocated to a process does not change Not affected by external circumstances In general, global replacement is better.

FRAME ALLOCATION For uniprogramming OS
Consider a single-user system with 128 KB memory with page size as 1KB Suppose the OS takes 35KB, 93 frames are left from the user process. Have no free-frame list pool Have a free-frame list pool. The user process is allocated any free frame. For multiprogramming OS?

Frame Allocation For multiprogramming OS
What is the minimum number of frames for a process? What is the maximum number of frames for a process? How to split the free frames among the processes?

Frame Allocation Minimum number of frames for a process
Defined by the architecture We must have enough frames to hold all the different pages that any single instruction can reference. A machine in which all memory-reference instructions have only one memory address. 2 + 1 (if indirected) The IBM 370’s MVC Instruction: 6 bytes 2 pages Source addresses and destination addresses  4 pages Some machines that allow multiple levels of indirection. Maximum number of frames for a process? Defined by the amount of available physical memory

Frame Allocation: Fixed Allocation
Equal allocation – e.g., if 100 frames and 5 processes, give each 20 pages. Proportional allocation – Allocate according to the size of process.

Frame Allocation: Priority Allocation
Use a proportional allocation scheme using a function of priorities and sizes. If process Pi generates a page fault, select for replacement one of its frames. select for replacement a frame from a process with lower priority number.

Frame Allocation: Global versus Local Allocation
Global replacement: To allow a process to select a replacement frame from the set of all frames, even if that frame is currently allocated to some other process. Can increase the number of frames allocated. Cannot control its page-fault rate. Greater system throughput. Local replacement: To allow a process to select from only its own set of allocated frames. Cannot increase the number of frames allocated. Canot control its page-fault rate.

THRASHING A process is thrashing if it is spending more time paging and executing. (A student is thrashing sometimes as well (going to the Libs)) Cause Solutions Working-set model Page-fault frequency

Thrashing: Cause Cause of thrashing While (CPU utilization: too low)
increase the degree of multiprogramming Processes will take frames away from other processes (by global replacement) processes wait for the paging device Thrashing

Thrashing: Cause

Thrashing: Locality The locality model states that, as a process executes, it moves locality to locality. A locality is a set of pages that are actively used together. A program is generally composed of several different localities, which may overlap. If a process has enough frames to contain its localities, then it run smoothly.

Thrashing: Locality Locality In A Memory-Reference Pattern

Thrashing: Locality How to prevent thrashing
To allocate enough frames to a process to accommodate its current locality. It will fault for the pages in its locality until all these pages are in memory; then it will not fault again until it changes localities. If we allocate fewer frames that the size of the current locality, the process will thrash, since it cannot keep in memory all the pages that it is actively using. Two approaches Working-set model Page-fault frequency

Thrashing: Working-Set Model
Working-set window, working set Working-set window  is a fixed number of page references Working set is the set of pages in the most recent  page references. If a page is in active use, it will be in the working set. If it is no longer being used, it will drop from the working set  time units after its last reference.  The working set is an approximation of the program’s locality.

An approximation of the program’s locality The accuracy of the working set depends on the selection of  if  too small will not encompass entire locality. if  too large  will encompass several localities. if  =   will encompass entire program. Changes dynamically

How does the working-set model works? To compute the working-set size for each process in the system, To compute the total demand for frames. If the total demand is less than the total availabe frames, no thrashing will occur. If the total demand is greater than the total available frames, thrashing will occur.

If one process doesn’t thrash, it should have enough frames for its working set. If all process don’t thrash, the OS should have enough frames for total working sets. If thrash ever happens, the OS swaps some processes out to reduce the degree of multiprogramming. How to keep track of the working set.

Approximate with interval timer + a reference bit Example:  = 10,000 Timer interrupts after every 5000 time units. Keep in memory 2 bits for each page. Whenever a timer interrupts, copy and clear the values of all reference bits to 0. If one of the bits in memory = 1  its page is in working set.  not completely accurate? Improvement 10 bits for each page interrupt every 1000 time units.

Thrashing: Page-Fault Frequency Scheme
The working-set model is successful, knowledge of the working-set can be useful for prepaging, but it seems a clumsy way to control thrashing. Establish “acceptable” page-fault rate. If actual rate too low, process loses frame. (could be suspended) If actual rate too high, process gains frame.

Thrashing: Page-Fault Frequency Scheme

OTHER CONSIDERATIONS Prepaging Page size selection TLB Reach
Program structure I/O Interlock

Other Considerations: Prepaging
Pure demand paging  the large number of page faults that occur when a process is started. Prepaging: to prevent this high level of initial paging. Prepaging: to try to bring into memory at one time all the pages that will be needed. To bring in the saved working set Discussion: whether the cost of using prepaging is less than the cost of servicing the corresponding page faults.

Other Considerations: Page size
Page size selection Page table size Internal fragmentation I/O overhead (seek time, latency time, transfer time) Locality Page fault rate  The trend is toward larger and larger page size.

Other Considerations: TLB reach
TLB Reach - The amount of memory accessible from the TLB. TLB Reach = (TLB Size) X (Page Size) Ideally, the working set of each process is stored in the TLB. Otherwise there is a high degree of page faults.

Other Considerations: TLB Reach
Increase the Page Size. This may lead to an increase in fragmentation as not all applications require a large page size. Provide Multiple Page Sizes. This allows applications that require larger page sizes the opportunity to use them without an increase in fragmentation.

Other Considerations: Program structure
int A[][] = new int[1024][1024]; Each row is stored in one page. Program 1: 1024 x 1024 page faults for (j = 0; j < A.length; j++) for (i = 0; i < A.length; i++) A[i,j] = 0; Program 2: 1024 page faults for (i = 0; i < A.length; i++) for (j = 0; j < A.length; j++) A[i,j] = 0;

Other Considerations: I/O Interlock
I/O Interlock – Pages must sometimes be locked into memory. Consider I/O. Pages that are used for copying a file from a device must be locked from being selected for eviction by a page replacement algorithm.

Other Considerations: I/O Interlock
Reason Why Frames Used For I/O Must Be In Memory

Homework 1 6 9 10 11 16

CHAPTER 10: VIRTUAL MEMORY

Similar presentations

Presentation on theme: "CHAPTER 10: VIRTUAL MEMORY"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CHAPTER 10: VIRTUAL MEMORY

Similar presentations

Presentation on theme: "CHAPTER 10: VIRTUAL MEMORY"— Presentation transcript:

Similar presentations

About project

Feedback