Chapter 9 Virtual Memory

Chapter 9 Virtual Memory
Background Demand Paging Copy-on-Write Page Replacement Allocation of Frames Thrashing Memory-Mapped Files Allocating Kernel Memory Other Considerations Operating-System Examples*

There are cases where not entire program is needed:
9.1 Background Previous strategies require that an entire process be in memory before it can execute There are cases where not entire program is needed: Code to handle unusual error conditions is almost never executed Arrays, lists, tables are often allocated more memory than they actually need Certain options and features of a program may be used rarely Even the entire program is needed, it may not be all needed in the same time

9.1 Background Benefits of the ability to execute a program that is only partially in memory Have a very large virtual address space Higher multiprogramming degree (increase CPU utilization) Program run faster (less I/O would be needed to load or swap)

Virtual memory can be implemented via
9.1 Background Virtual memory Separation of user logical memory from physical memory Only part of the program needs to be in memory for execution Logical address space can therefore be much larger than physical address space (Fig. 9.1) Address spaces to be shared by several processes Allows for more efficient process creation Virtual memory can be implemented via Demand paging Demand segmentation

9.1 Background: Virtual Memory is Larger Than Physical Memory (Figure 9.1)

9.1 Background: Virtual Address Space
Large blank space (hole) between the heap and the stack is part of virtual address space but requires actual physical pages only if the heap or stack grows Enables sparse address spaces with holes left for growth, dynamically linked libraries, etc Files and system libraries shared via mapping into virtual address space Shared memory by mapping pages read-write into virtual address space Pages can be shared during fork(), speeding process creation

9.1 Background: Shared Library Using Virtual Memory

Bring a page into memory only when needed (lazy swapper)
9.2 Demand Paging (需求分頁) Paging + swapping Processes reside on secondary memory (i.e. disk) Bring a page into memory only when needed (lazy swapper) Less I/O Less memory Faster response More users Use pager instead of swapper Page is needed  reference to it Invalid reference  abort Not-in-memory  bring to memory A lazy swapper never swaps a page into memory unless that page will be needed.

9.2 Demand Paging: Transfer of a Paged Memory to Contiguous Disk Space

9.2 Demand Paging: Valid-Invalid Bits
With each page table entry a valid–invalid bit is associated 1: in-memory, 0: not-in-memory Initially valid–invalid is set to 0 on all entries A page table snapshot During address translation, if valid–invalid bit in page table entry is 0  page fault Frame # valid-invalid bit 1 1 1 1  page table

9.2 Demand Paging: Valid-Invalid Bits

9.2 Demand Paging: Page Fault
If there is ever a reference to a page, first reference will trap to OS  page fault OS looks at another table to decide if the reference was a valid or invalid memory access Get empty frame Swap page into frame Reset tables, validation bit = 1 Restart instruction Block move Auto increment/decrement location

9.2 Demand Paging: Page Fault Handling

Trap to the operating system
Save the user registers and process state Determine that the interrupt was a page fault Check that the page reference was legal and determine the location of the page on the disk Issue a read from the disk to a free frame: Wait in a queue for this device until the read request is serviced Wait for the device seek and/or latency time Begin the transfer of the page to a free frame While waiting, allocate the CPU to some other user Receive an interrupt from the disk I/O subsystem (I/O completed) Save the registers and process state for the other user Determine that the interrupt was from the disk Correct the page table and other tables to show page is now in memory Wait for the CPU to be allocated to this process again Restore the user registers, process state, and new page table, and then resume the interrupted instruction

9.2 Demand Paging: Restarting Instruction after Page Fault (Worst-Case Example)
C  A + B 1. Fetch and decode the instruction (ADD) 2. Fetch A 3. Fetch B 4. ADD A and B 5. Store the sum in C (page fault!) Restart

9.2 Demand Paging: Restarting Instruction after Page Fault (Block-Move Example)
Major difficulty: one instruction may modify several different locations Solutions Access both ends of both blocks before execution Use temporary registers to hold the values of overwritten locations Page 3 Page 4 MVS: move up to 256 characters from source to target A B C D E F ... source target

To restart the instruction, we must restore the values of R2 and R3
9.2 Demand Paging: Restarting Instruction after Page Fault (Auto Increment/Decrement Example) MOV (R2)+, -(R3) 1. Load (R2) into X 2. R2++ 3. R3-- 4. Save X to (R3) To restart the instruction, we must restore the values of R2 and R3 The address that R3 points to causes a page fault

What happens if there is no free frame?
9.2 Demand Paging What happens if there is no free frame? Page replacement – find some page in memory, but not really in use, swap it out Algorithm Performance – want an algorithm which will result in minimum number of page faults Same page may be brought into memory several times

9.2 Demand Paging: Performance
Page fault rate 0  p  1 if p = 0, no page faults if p = 1, every reference is a fault Effective access time = (1–p) x memory access + p (page fault overhead + [swap page out] + swap page in + restart overhead) (研讀書本例)

Virtual memory allows other benefits during process creation
9.3 Copy-on-Write Virtual memory allows other benefits during process creation Copy-on-write Memory-mapped files (見 9.7 節)

Free pages are allocated from a pool of (free) zeroed-out pages
9.3 Copy-on-Write Allows both parent and child processes to initially share the same pages in memory 例：fork() Shared pages are marked as copy-on-write pages, meaning that if either process modifies a shared page, a copy of the shared page is created Allows more efficient process creation as only modified pages are copied Free pages are allocated from a pool of (free) zeroed-out pages Pool holds zeroed-out pages for COW or stack/heap

Before process 1 modifies page C
After process 1 modifies page C

vfork() vs. fork() with copy-on-write
Virtual memory fork used in UNIX-based systems No copy-on-write Child executes, parent suspends Any alteration by the child will be visible to resumed parent Must be used with caution, ensuring that the child process does not modify the address space of the parent Intended to be used when the child process call exec() immediately after creation

When there are no free frames on free-frame list, we could:
9.4 Page Replacement When there are no free frames on free-frame list, we could: Terminate the user process Swap out a process, freeing all its frames, and reducing the level of multiprogramming, or Find a frame that is not currently being used and free it To free a frame A read-only page may be discarded when desired A modified (dirty) page must be written to swap space The page table should be changed to indicate that the page is no longer in memory

9.4 Page Replacement Prevent over-allocation of memory by modifying page-fault service routine to include page replacement Use modify (dirty) bit to reduce overhead of page transfers Only modified pages are written to disk Page replacement completes separation between logical memory and physical memory Large virtual memory can be provided on a smaller physical memory

9.4 Page Replacement: Need for Page Replacement

9.4 Page Replacement: Basic Scheme
Find the location of the desired page on disk Find a free frame - If there is a free frame, use it - If there is no free frame, use a page replacement algorithm to select a victim (犧牲者) frame Read the desired page into the (newly) free frame. Update the page and frame tables Restart the process

Want lowest page-fault rate To evaluate algorithm Running it on a particular string of memory references (reference string) Compute the number of page faults on that string Reference string String of memory references Consider only the page number Page immediately follows the same page does not count

9.4 Page Replacement: Reference String Example
Address sequence recorded 0100, 0432, 0101, 0612, 0102, 0103, 0104, 0611, 0102, 0103, 0601, 0102, 0104, 0609, 0100, 0105 Page size = 100  reference string 1, 4, 1, 6, 1, 6, 1, 6, 1, 6, 1 Page faults versus the number of frames

9.4 Page Replacement: FIFO Algorithm
Reference string 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 3 frames (3 pages can be in memory at a time per process) 4 frames 1 1 4 5 2 2 1 3 9 page faults 3 3 2 4 1 1 5 4 2 2 1 5 10 page faults 3 3 2 4 4 3

9.4 Page Replacement: FIFO Algorithm
FIFO replacement – Belady’s anomaly (異常(現象)) More frames  less page faults ?!

9.4 Page Replacement: Optimal (OPT) Algorithm
Replace page that will not be used for longest period of time 4 frames example 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 How do you know the future references? Used for measuring how well an algorithm performs 1 4 2 6 page faults 3 4 5

9.4 Page Replacement: Optimal (OPT) Algorithm

9.4 Page Replacement: Least Recently Used (LRU) Algorithm
Reference string 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 1 5 2 3 5 4 4 3

Counter implementation Associate with each page-table entry a time-of-use field Add the CPU a logical clock or counter Whenever a reference to a page is made, copy the clock value to the time-of-use field of that page Replace the page with the smallest time value (needs searching)

Can be implemented to keep a stack of page numbers in a double link form When a page is referenced, move it to the top A move requires 6 pointers to be changed No search for replacement

Stack algorithm A page-replacement algorithms such that the set of pages in memory for n frames is always a subset of the set of pages that would be in memory with n+1 frames Stack algorithm never exhibits Belady’s anomaly Both OPT and LRU are stack algorithms

9.4 Page Replacement: LRU Approximation Algorithms
Few systems provide sufficient hardware support for the LRU page-replacement Many systems provide some help in the form of reference bits Reference bit Associated with each entry in the page table Set by the hardware, when that page is referenced Replace the one which is 0 (if any) But we do not know the exact order Lead to page-replacement algorithms that approximate LRU replacement

Record the reference bits at regular intervals
9.4 Page Replacement: LRU Approximation Algorithms (Additional Reference Bits) Record the reference bits at regular intervals Keep in memory an 8-bit byte for each page in a table Reference bit Shift right Page 0 is the LRU page (discarded) Page 0 Page 1 Page 2 Page 3 1 unsigned integers LRU pages = lowest number

9.4 Page Replacement: LRU Approximation Algorithms (Clock Replacement)
Second chance (can be implemented as a circular queue) Need reference bit If page to be replaced (in clock order) has reference bit 1, we give that page a second chance Clear its reference bit and reset its arrival time Leave page in memory; replace next page (in clock order), subject to same rules (R) (R) (R) youngest page To be replaced Current pointer 1 1 1 1 1 1 1 1 1

Consider both the reference bit and modify bit
9.4 Page Replacement: LRU Approximation Algorithms (Enhanced 2nd-Chance Algorithm) Consider both the reference bit and modify bit (reference bit, modify bit) pair defines 4 classes of pages (0, 0): neither recently used nor modified (0, 1): not recently used but modified (1, 0): recently used but clean (1, 1): recently used and modified First search all pages in the lowest class, then the 2nd lowest class, … etc Replace the first page encountered in the lowest nonempty class May scan the circular queue several times

9.4 Page Replacement: Counting Algorithms
Keep a counter of the number of references that have been made to each page Least frequently used (LFU) algorithm Replaces page with smallest count Most frequently used (MFU) algorithm Based on the argument that the page with the smallest count was probably just brought in and has yet to be used

9.4 Page Replacement: Page Buffering Algorithms
System commonly keeps a pool of free frames Desired page is read before victim is written out Allows the process to restart as soon as possible Expansion: maintain a list of modified pages Whenever paging device is idle, a modified page is written to the disk (and its modify bit is reset) Another modification: keep a pool of free frames but to remember which page was in each frame When a page fault occurs, first check whether the desired page is in the free-frame pool

Each process needs minimum number of frames
9.5 Allocation of Frames Recall When a page fault occurs before an executing instruction completes, the instruction must be restarted Each process needs minimum number of frames Defined by the instruction-set architecture Must have enough frames to hold all the different pages that any single instruction can reference All memory-reference instructions have only ONE memory address  two frames One-level indirect addressing  three frames

Minimum number of needed frames
9.5 Allocation of Frames Minimum number of needed frames If instruction itself may straddle two pages 2 frames needed If n-level indirect addressing is allowed n+1 frames needed 2-level indirect addressing OP code Operand 1 Operand 2 y z Page x Page y Page z Page n Page n+1 Direct indicator

9.5 Allocation of Frames: Allocation Algorithms
The way to split frames among processes Equal allocation 例: 100 frames, 5 processes  give each 20 pages Proportional allocation Allocate available memory to each process according to its size

9.5 Allocation of Frames: Allocation Algorithms
Remarks on fixed allocation algorithms For equal or proportional allocations, processes lose some frames as the multiprogramming level is increased Priority allocation Use a proportional allocation scheme using priorities rather than size If process Pi generates a page fault, Select for replacement one of its frames Select for replacement a frame from a process with lower priority number

9.5 Allocation of Frames: Global vs. Local Allocation
Global replacement Process selects a replacement frame from the set of all frames One process can take a frame from another A process cannot control its own page-fault rate! Local replacement Each process selects from only its own set of allocated frames

9.6 Thrashing (a process is busy swapping pages in and out)
If a process does not have “enough” pages, the page-fault rate is very high, leading to Low CPU utilization OS tries to increase the multiprogramming degree, so adds another process to the system OS monitors CPU utilization CPU utilization is too low Increase the multiprogramming degree New process needs free frames The ready queue empties Replace other process’s pages (global replacement) ……. Many processes queue up for paging device Victim process runs  Page fault Replace other process’s pages (global replacement)

Why does thrashing occur?
9.6 Thrashing: Causes Locality model Process migrates from one locality to another Localities may overlap Why does thrashing occur? (locality size) > total memory size

9.6 Thrashing: Causes Locality in a memory- reference pattern

9.6 Thrashing To overcome thrashing
When the multiprogramming degree is increased but CPU utilization drops, we should decrease the degree of multiprogramming Thrashing can be limited by using a local (or priority) replacement algorithm (do not steal frames from another process) To prevent thrashing, we must provide a process as many frames as needed Working-set strategy can be used to determine how many frames a process is actually using

9.6 Thrashing: Working-Set Model
Based on the assumption of locality  : working-set window Fixed number of page references, e.g. 10,000 instructions working-set: the set of pages in the most recent  page references

WSSi (working set of process Pi) Total number of pages referenced in the most recent  (varies in time) Selection of  is very important If  too small, will not encompass entire locality If  too large, will encompass several localities If  = , will encompass entire program D =  WSSi = total demand frames If D > m  thrashing Policy: if D > m, then suspend one of the processes

Implementation difficulty Keep track of the working set (a moving window) Approximate with a fixed interval timer interrupt plus a reference bit 例:  =10,000 1 timer interrupts every 5000 references Keep in memory 2 bits for each page Whenever a timer interrupts, copy and clear the R-bit for each page Those pages with at least 1 bit on will be considered to be in the working set To improve accuracy 例: 10 bits and interrupt every 1000 time units

9.6 Thrashing: Page-Fault Frequency Scheme
Working set can be useful for prepaging, but it seems rather clumsy to control thrashing Page-fault frequency scheme takes a more direct approach Thrashing has a high page-fault rate. Thus, high page-fault rate needs more frames and low page-fault rate have too many frames Establish upper and lower bounds on the desired page-fault rate

9.6 Thrashing: Page-Fault Frequency Scheme
Directly measure and control the page-fault rate to prevent thrashing Establish “acceptable” page-fault rate If actual rate too low, process loses frame If actual rate too high, process gains frame

A file is initially read using demand paging
9.7 Memory-Mapped Files Allows a part of the virtual address space to be logically associated with a file File I/O to be treated as routine memory access by mapping a disk block to a page in memory A file is initially read using demand paging A page-sized portion of the file is read from the file system into a physical page Subsequent reads/writes to/from the file are treated as ordinary memory accesses Simplifies file access by treating file I/O through memory rather than read() write() system calls

9.7 Process Creation: Memory-Mapped Files
Also allows several processes to map the same file allowing the pages in memory to be shared

9.8 Allocating Kernel Memory
不列入考試範圍

9.9 Other Considerations Prepaging
With pure demand-paging, when a process starts or is paged in for restart, large number of page faults will occur Prepaging brings into memory at one time all the pages to be needed (according to its working set) Can prevent the high level faults of initial paging Must save the working set when a process is paged out Should consider Cost of prepaging  Cost of servicing page-faults?

9.9 Other Considerations Page size selection
Small page size reduces internal fragmentation Table size Minimizing I/O time argues for a larger page size Seek and latency time dwarf transfer time 2 separate small I/O is longer than 1 large A smaller page size allows each page to match program locality more accurately Better resolution, less I/O, less total allocated memory Larger page size minimizes the number of page faults

9.9 Other Considerations TLB reach
Amount of memory accessible from the TLB TLB reach = (TLB size) × (page size) Ideally, the working set of each process is stored in the TLB. Otherwise there is a high degree of page faults Increasing the size of the TLB Increase the page size This may lead to an increase in fragmentation as not all applications require a large page size Provide multiple page sizes This allows applications that require larger page sizes the opportunity to use them without an increase in fragmentation

9.9 Other Considerations Program structure
int A[][] = new int[1024][1024]; Each row is stored in one page Program 1 for (j = 0; j < A.length; j++) for (i = 0; i < A.length; i++) A[i,j] = 0; 1024 x 1024 page faults Program 2 for (i = 0; i < A.length; i++) for (j = 0; j < A.length; j++) A[i,j] = 0; 1024 page faults

9.9 Other Considerations I/O interlock
Sometimes we want to allow some of the pages to be locked in memory (frames used for I/O) Pages used for copying a file from a device must be locked from being selected for eviction by a page replacement algorithm Methods Never execute I/O to user memory user memory  system memory  I/O device Allow pages to be locked into memory A lock bit is associated with every frame A locked frame cannot be selected for replacement

9.9 Other Considerations Why frames used for I/O must be in memory?

Another use of page locking
9.9 Other Considerations Another use of page locking Enter the ready queue and wait for the CPU A low-priority process faults Reads the necessary page into memory This can be prevented if we lock these pages A high-priority process faults The low-priority process is selected for execution Pages of low priority process are replaced

9.10 Operating-System Examples
不列入考試範圍

Chapter 9 Virtual Memory

Similar presentations

Presentation on theme: "Chapter 9 Virtual Memory"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 9 Virtual Memory

Similar presentations

Presentation on theme: "Chapter 9 Virtual Memory"— Presentation transcript:

Similar presentations

About project

Feedback