Page Replacement Algorithms

Similar presentations


Presentation on theme: "Page Replacement Algorithms"— Presentation transcript:

1 Page Replacement Algorithms
Don Porter Portions courtesy Emmett Witchel and Kevin Jeffay

2 Virtual Memory Management: Recap
Key concept: Demand paging Load pages into memory only when a page fault occurs Issues: Placement strategies Place pages anywhere – no placement policy required Replacement strategies What to do when there exist more jobs than can fit in memory Load control strategies Determining how many jobs can be in memory at one time Operating System User Program 1 User Program 2 User Program n ... Memory Although we didn’t deal with these issues explicitly, they all apply to the partitioning schemes we looked at previously. Note on load control: — Load control in the small (when to load pages), and — Load control in the large (when to load processes). The latter is the same as medium/long-term scheduling.

3 Page Replacement Algorithms
Typically i VASi >> Physical Memory With demand paging, physical memory fills quickly When a process faults & memory is full, some page must be swapped out Handling a page fault now requires 2 disk accesses not 1! Which page should be replaced? Local replacement — Replace a page of the faulting process Global replacement — Possibly replace the page of another process Idea: Now we have lots of programs in memory — more than will fully fit into memory in their entirety. When a fault occurs, which page should be replaced? — The fact that swapping out a page increases fault handling time argues for considering only clean pages. (However, what if that page is needed immediately?) — How about applying CPU scheduling policies? FIFO? (Replace oldest page.) SJF? (Except all pages have the same size.) Local replacement implies that processes are allocated a fixed number of page frames. Global replacement implies a variable number of frames are allocated to each process. Digression: How do we decide which of two schemes is better?

4 Page Replacement: Eval. Methodology
Record a trace of the pages accessed by a process Example: (Virtual page, offset) address trace... (3,0), (1,9), (4,1), (2,1), (5,3), (2,0), (1,9), (2,4), (3,1), (4,8) generates page trace 3, 1, 4, 2, 5, 2, 1, 2, 3, 4 (represented as c, a, d, b, e, b, a, b, c, d) Hardware can tell OS when a new page is loaded into the TLB Set a used bit in the page table entry Increment or shift a register Simulate the behavior of a page replacement algorithm on the trace and record the number of page faults generated fewer faults better performance START HERE 18 We are recording only which page is accessed. Not which memory address is being accessed. — We don’t care which location within a page is being accessed. — This trace is rather bogus as the page changes with each new address generated. (Normally you’d expect to see a large number of consecutive references to the same page.) To avoid confusion with physical frame numbers, we will represent virtual page numbers with letters.

5 Optimal Strategy: Clairvoyant Replacement
Replace the page that won’t be needed for the longest time in the future Initial allocation Time c a d b e b a b c d Requests 1 2 3 a b c d Frames Page Hand this slide out to students. Faults Time page needed next

6 Optimal Strategy: Clairvoyant Replacement
Replace the page that won’t be needed for the longest time in the future Time c a d b e b a b c d Requests 1 2 3 a b c d a a a a a a a a a d b b b b b b b b b b Frames Page c c c c c c c c c c For this trace and this initial placement of pages, the optimal page replacement strategy results in 2 page faults. Note that the choice of which page to replace at time 10 is arbitrary (or at least is not supported by this trace). d d d d e e e e e e • • Faults a = 7 b = 6 c = 9 d = 10 a = 15 b = 11 c = 13 d = 14 Time page needed next

7 Local Replacement: FIFO
Simple to implement A single pointer suffices Performance with 4 page frames: Physical Memory 1 2 Frame List Time Requests c a d b e b a b c d Hand this slide out to students. Explain the notation... a 1 b Frames Page 2 c 3 d Faults

8 Local Replacment: FIFO
Simple to implement A single pointer suffices Performance with 4 page frames: Physical Memory 3 2 Frame List Time Requests c a d b e b a b c d a a a a a e e e e e d For local replacement we assume processes have a fixed number of page frames allocated to them. We start with CPU-like scheduling policies. Try FIFO (FIFO in terms of when page was allocated). — Assume the OS maintains a list (similar to the page table) of frames allocated to the process. — Pages placed in this list when a page fault occurs. — The pointer into the table always points to the oldest page In the chart, virtual time (measured in “new pages accessed”) goes across the x- axis. — Indices in yellow table are not actual frame addresses but are indices into a “frame list.” As in CPU scheduling, what is an optimal policy? — It’s a different problem than scheduling. We’re not trying to to determine a good ordering, just minimize overhead. 1 b b b b b b b a a a a Frames Page 2 c c c c c c c c b b b 3 d d d d d d d d d c c • • • • • Faults

9 Least Recently Used (LRU) Replacement
Use the recent past as a predictor of the near future Replace the page that hasn’t been referenced for the longest time Time Requests c a d b e b a b c d a 1 b Frames Page 2 c Hand this slide out to students. 3 d Faults Time page last used

10 Least Recently Used (LRU) Replacement
Use the recent past as a predictor of the near future Replace the page that hasn’t been referenced for the longest time Time Requests c a d b e b a b c d a a a a a a a a a a a 1 b b b b b b b b b b b Frames Page 2 c c c c c e e e e e d A heuristic that is frequently used to approximate clairvoyant resource allocation policies is to use the past to predict the future. — You can’t look forward in time so you look backwards and use the recent past to predict the near future. (The “back to the future” heuristic!) — This is similar in concept to the SJF computation time estimation heuristic used in CPU scheduling. Note that this is just a heuristic. We do a bad job of page replacement at time 9. — We replace a page that we’re just going to reference 1 (virtual) time unit later. 3 d d d d d d d d d c c • • • Faults a = 2 b = 4 c = 1 d = 3 a = 7 b = 8 e = 5 d = 3 a = 7 b = 8 e = 5 c = 9 Time page last used

11 How to Implement LRU? Frames Page • • •
Maintain a “stack” of recently used pages Time Requests c a d b e b a b c d a a a a a a a a a a a Frames Page 1 b b b b b b b b b b b 2 c c c c c e e e e e d 3 d d d d d d d d d c c • • • Faults START HERE Hand this slide out to students. LRU page stack Page to replace

12 How to Implement LRU? Frames Page • • •
Maintain a “stack” of recently used pages Time Requests c a d b e b a b c d a a a a a a a a a a a Frames Page 1 b b b b b b b b b b b 2 c c c c c e e e e e d 3 d d d d d d d d d c c • • • Faults Doing pure LRU requires the OS to keep too much state (record too much history). A more practical method is to use a stack to keep a bounded space (variable duration) history. — We push new page references onto the stack. — The oldest, least recently used page is always at the bottom of the stack. More evidence that OS folks have no clue what data structures really are! — The “stack” that we’re using here is really a “queue” (with funky dequeue operations). This scheme is too hard to implement in practice. — Requires specialized and complex hardware (stack size depends on number of frames allocated to the process). c a d b e b a b c d LRU page stack c a d b e b a b c c a d d e e a b c a a d d e a Page to replace c d e

13 What is the goal of a page replacement algorithm?
A. Make life easier for OS implementer B. Reduce the number of page faults C. Reduce the penalty for page faults when they occur D. Minimize CPU time of algorithm

14 Approximate LRU: The Clock Algorithm
Maintain a circular list of pages resident in memory Use a clock (or used/referenced) bit to track how often a page is accessed The bit is set whenever a page is referenced Clock hand sweeps over pages looking for one with used bit = 0 Replace pages that haven’t been referenced for one complete revolution of the clock Page 7: 1 1 func Clock_Replacement begin while (victim page not found) do if(used bit for current page = 0) then replace current page else reset used bit end if advance clock pointer end while end Clock_Replacement Page 1: An approximation to an approximation to an optimal algorithm! — But it’s a popular approximation — something very close to this is used in the Macintosh VM system. Algorithm is run every time a page fault occurs. Note that the circularly-linked-list is implemented in the page table. It’s a circular list of all resident pages. — The figure shows rows in the page table. — The used/clock bit is part of a page’s page table entry. — The search can start with the first valid entry in the page table and there-after continue where it left off last time. 1 5 Page 4: 1 3 Page 3: 1 1 1 Page 0: 1 1 4 resident bit used bit frame number

15 Clock Example 1 2 3 4 5 6 7 8 9 10 c a d b e b a b c d 1 2 3 a b c d a
1 2 3 4 5 6 7 8 9 10 Time c a d b e b a b c d Requests 1 2 3 a b c d a a a a b b b b Frames Page c c c c d d d d Faults Hand this slide out to students. 1 a b c d Page table entries for resident pages:

16 Clock Example 1 2 3 4 d c b e 5 • d c b e 6 d a b e 7 • d a b e 8 c a
1 2 3 4 d c b e 5 d c b e 6 d a b e 7 d a b e 8 c a b e 9 c a b d 10 Time c a d b Requests 1 2 3 a b c d a a a a b b b b Frames Page c c c c d d d d Faults Here we’re showing a page list similar to the one we used in FIFO page replacement. — Each entry in the list that contains the clock/used bit and the name of the virtual page. — In practice the clock algorithm is implemented directly in the page table. Note that at time 5 page a is replaced and yet page c was the LRU page. — That page a was replaced is a function of the initial value of the clock pointer. Over time we would expect a closer fit to LRU behavior. 1 a 1 e 1 e 1 e 1 e 1 e 1 d Page table entries for resident pages: 1 b b 1 b b 1 b 1 b b 1 c c c 1 a 1 a 1 a a 1 d d d d d 1 c c

17 Optimization: Second Chance Algorithm
There is a significant cost to replacing “dirty” pages Why? Must write back contents to disk before freeing! Modify the Clock algorithm to allow dirty pages to always survive one sweep of the clock hand Use both the dirty bit and the used bit to drive replacement Page 7: 1 1 Second Chance Algorithm Before clock sweep After clock sweep Page 1: If we can replace a clean page we don’t have to worry about saving its contents. — We assume the contents of the page are already stored in some executable file (e.g., an a.out file) on disk. As before, when a page is referenced, the reference/clock/used bit is set. — If the reference is a write then the dirty bit is also set. When the clock hand sweeps over a dirty page the used bit is reset. — When the clock next sweeps over the page the dirty bit is reset (and we assume the OS somehow remembers that the page is actually dirty). — Only pages with used and dirty bits equal to 0 are replaced. Thus it will take at least two sweeps of the clock to remove a dirty page. 1 5 Page 4: 1 3 used dirty used dirty replace page 1 1 1 Page 3: 1 1 1 9 Page 0: 1 1 1 4 resident bit used bit frame number

18 Second Chance Example 1 2 3 4 5 6 7 8 9 10 c aw d bw e b aw b c d 1 2
1 2 3 4 5 6 7 8 9 10 Time c aw d bw e b aw b c d Requests 1 2 3 a b c d a a a a b b b b Frames Page c c c c d d d d Faults Hand this slide out to students. 10 a b c d Page table entries for resident pages:

19 Second Chance Example 1 2 3 4 d e b a 5 • d e b a 6 d e b a aw 7 d e b
1 2 3 4 d e b a 5 d e b a 6 d e b a aw 7 d e b a 8 c e b a 9 c e d a 10 Time c aw d bw Requests 1 2 3 a b c d a a a a b b b b Frames Page c c c c d d d d Faults Now we have to augment our trace with an indication of which references are reads and which are writes. Each entry in the partial page table entry shows the clock/used bit and the dirty bit for each resident page. This time at times 5 and 9 we in fact replace the LRU page. — However, this is just dumb luck in this case. Note that at time 10 we replace page b even though page e is a significantly more LRU page. Page table entries for resident pages: 10 a b c d 11 10 a b c d 00 a* 00 a 11 a 11 a 00 a* 00 b* 10 b 10 b 10 b 10 d 10 e 10 e 10 e 10 e 00 e 00 d 00 d 00 d 10 c 00 c

20 Local Replacement and Memory Sensitivity
Time Requests a b c d a b c d a b c d a Frames Page 1 b 2 c Faults Hand this slide out to students. a 1 b Frames Page 2 c 3 Faults

21 Local Replacement and Memory Sensitivity
Time Requests a b c d a b c d a b c d a a a a d d d c c c b b b Frames Page 1 b b b b b a a a d d d c c 2 c c c c c c b b b a a a d • • • • • • • • • Faults We now turn to the issue of local v. global page replacement algorithms. We’ve been looking at local replacement algorithms. — This assumes a process has a fixed number of frames allocated to each process. But how do we determine how many frames to allocate to a process? — What happens if we don’t allocate enough? — Things can go very bad! This example assumes FIFO page replacement. One page makes a huge difference Is this a contrived example? Would a real program ever access memory in this way? — Absolutely! Consider a program that is sequentially accessing elements of a very large data structure. a a a a a a a a a a a a a 1 b b b b b b b b b b b b b Frames Page 2 c c c c c c c c c c c c c 3 d d d d d d d d d Faults

22 Page Replacement Performance
Local page replacement LRU — Ages pages based on when they were last used FIFO — Ages pages based on when they’re brought into memory Towards global page replacement ... with variable number of page frames allocated to processes The principle of locality 90% of the execution of a program is sequential Most iterative constructs consist of a relatively small number of instructions When processing large data structures, the dominant cost is sequential processing on individual structure elements Temporal vs. physical locality What we really want to do is to age pages based on when they will be used next (the clairvoyant policy). We can do this better if we allow the number of frames allocated to a process to vary dynamically. — The idea isn’t so much to take frames away from other processes but rather to allow a process’s memory allocation to grow (and shrink) over time. Programs need different amounts of memory at different times. — (And if memory is full this means taking memory away from another process.) The principle of locality argues that a fixed number of frames should work well (over short intervals). — We can also use this principle to determine what this number of frames is (what we’ll later call the “working set”). Note that we often apply the concept of locality separately to code pages (bullet 2 above) and data pages (bullet 3).

23 Optimal Replacement with a Variable Number of Frames
VMIN — Replace a page that is not referenced in the next  accesses Example:  = 4 Time Requests c c d b c e c e a d Page a t = 0 START HERE Hand this slide out to students. Page b - in Memory Pages Page c - Page d t = -1 Page e - Faults

24 Optimal Replacement with a Variable Number of Frames
VMIN — Replace a page that is not referenced in the next  accesses Example:  = 4 Time Requests c c d b c e c e a d Page a F - t = 0 Page b - - - - F How good is the working set policy? — What fault rate would an optimal policy have given (and what would such an optimal (global replacement) policy look like)? Note that in VMIN the forward looking window is only used to determine which pages should be removed from the working set. — Pages are only added to the working set when they are referenced. — Thus, for example, page a is removed at time 1 because it won’t be used soon enough in the future, but page b is not added, even though it will be used soon. It is not added because it is not referenced at time 1. Note that we aren’t considering the removal of a page as a page fault even though it may require as much work as a page fault (if the removed page is dirty). — Removing a page can be done in the background and in any event it does not require that the owning process stop execution. in Memory Pages Page c - F • • • • • • • - - Page d • • • F t = -1 Page e - F • • - - Faults • • • • •

25 The Working Set Model Assume recently referenced pages are likely to be referenced again soon… ... and only keep those pages recently referenced in memory (called the working set) Thus pages may be removed even when no page fault occurs The number of frames allocated to a process will vary over time A process is allowed to execute only if its working set fits into memory The working set model performs implicit load control This policy: 1. Allocates a variable number of frames to processes, and 2. Does implicit load control. — Doesn’t eliminate the need for load control. It just simplifies the process.

26 Working Set Page Replacement
Keep track of the last  references (excluding faulting reference) The pages referenced during the last  memory accesses are the working set  is called the window size Example: Working set computation,  = 4 references: Time Requests c c d b c e c e a d Hand this slide out to students. Page a t = 0 Page b - in Memory Pages Page c - Page d t = -1 Page e t = -2 Faults

27 Working Set Page Replacement
Keep track of the last  references The pages referenced during the last  memory accesses are the working set  is called the window size Example: Working set computation,  = 4 references: Time Requests c c d b c e c e a d Point: size of t matters for page fault frequency We are now using a different notation. — Previously we listed the contents of each frame allocated to the process. — Now we are only indicating whether a virtual page is in memory or not (and where the page is in physical memory doesn’t matter). The pages in memory are the process’s working set. — The window size defines the working set. The challenge is to efficiently decrease a process’s working set. — When should the working set size be decreased? — Ideally we’d like to tie this computation to an (already expensive) event like a page fault. Page a • • • F • t = 0 Page b - - - - F • • • in Memory Pages Page c - F • • • • • • • • • Page d • • • • • • F t = -1 Page e • F • • • • t = -2 Faults • • • • •

28 Page-Fault-Frequency Page Replacment
An alternate approach to computing working set Explicitly attempt to minimize page faults When page fault frequency is high — increase working set When page fault frequency is low — decrease working set Algorithm: Keep track of the rate at which faults occur When a fault occurs, compute the time since the last page fault Record the time, tlast, of the last page fault If the time between page faults is “large” then reduce the working set If tcurrent – tlast > , then remove from memory all pages not referenced in [tlast, tcurrent ] If the time between page faults is “small” then increase working set If tcurrent – tlast ≤ , then add faulting page to the working set The PFF algorithm uses a window size parameter just as the direct WS replacement policy. Here the only state that must be maintained is the time that each process last faulted. — Where time here is a per process virtual time value measured in units of process memory references. — Thus we’re computing the number of memory references made by this process since the last time this process faulted. Note that in all cases when a page fault occurs, the faulting page is always added to the working set. Again, this algorithm only runs when a page fault occurs.

29 Page Fault Frequency Replacement
Example, window size = 2 If tcurrent – tlast > 2, remove pages not referenced in [tlast, tcurrent ] from the working set If tcurrent – tlast ≤ 2, just add faulting page to the working set Time Requests c c d b c e c e a d Page a Page b - in Memory Pages Page c - Hand this slide out to students. Page d Page e Faults tcur – tlast

30 Page Fault Frequency Replacement
Example, window size = 2 If tcurrent – tlast > 2, remove pages not referenced in [tlast, tcurrent ] from the working set If tcurrent – tlast ≤ 2, just add faulting page to the working set Time Requests c c d b c e c e a d Page a • • • F • Page b - - - - F • • • • - - in Memory Pages Page c - F • • • • • • • • • The times are the times of faults. Why improved – only computes WS on faults. Still need to set window size properly---no magic Note that at times 6 and 10 the page fault frequency is too low and thus no page is removed from memory. Why is this algorithm inherently more desirable than WS? — Because it only computes the working set (an expensive operation) when a page faults occurs, i.e., when the OS is already doing an expensive operation. Page d • • • • • • • • - F Page e • • • - - F • • • • Faults • • • • • tcur – tlast 1 3 2 3 1

31 Load Control: Fundamental Trade-off
High multiprogramming level minimum number of frames required for a process to execute number of page frames MPLmax = Low paging overhead MPLmin = 1 process Issues What criterion should be used to determine when to increase or decrease the MPL? Which task should be swapped out if the MPL must be reduced? What is the largest the MPL can (reasonably) be? For example, what is the minimum number of pages that must be in memory in order for a process to execute?  — Depends on the architecture of the machine, in particular, the number of memory locations that a single instruction can touch. For example, it takes at least one page to hold the instruction, one page for data (likely a different page if text and data are in separate segments), and possibly other pages if indirect addressing modes are supported. Why might the multiprogramming level be reduced? — There is a fundamental tradeoff between having a high MPL (having high utilization) and having low overhead due to page faults.

32 Load Control Done Wrong
I/O Device CPU ... i.e., based on CPU utilization Assume memory is nearly full A chain of page faults occur A queue of processes forms at the paging device CPU utilization falls Operating system increases MPL New processes fault, taking memory away from existing processes CPU utilization goes to 0, the OS increases the MPL further... Paging Device CPU utilization falls. Why? — More processes are doing I/O for paging and hence are spending more time away from the CPU than they otherwise would. Processes aren’t doing useful (disk) I/O. Instead they are incurring overhead? What to do when CPU utilization falls? — Give the CPU more work to do by admitting more processes into the system! The problem is that faults are occurring because processes don’t have enough memory relative to their locus of execution. System is thrashing — spending all of its time paging

33 Load Control and Thrashing
Thrashing can be ameliorated by local page replacement Better criteria for load control: Adjust MPL so that: mean time between page faults (MTBF) = page fault service time (PFST)  WSi = size of memory 1.0 CPU Utilization Multiprogramming Level 1.0 MTBF PFST Why does local replacement work? — One process cannot impact another process because it cannot take memory away from that process. Can a single process now thrash? — Yes! A process with poor locality can thrash. The idea behind the MTBPF/PFST ratio is to balance the paging I/O-CPU load. — To eliminate queuing at the paging device. (PFST is assumed to be a constant.) Empirical studies suggest it provides close to optimal performance. — An alternate policy is to keep the paging device at approximately 50% utilization. Nmax NI/O-BALANCE

34 Load Control and Thrashing
Physical Memory ready queue semaphore/condition queues Waiting Running Ready ? Suspended suspended queue When the multiprogramming level should be decreased, which process should be swapped out? Lowest priority process? Smallest process? Largest process? Oldest process? Faulting process? When we decrease the multiprogramming level we swap out an entire process and release all of its memory. Who should be swapped out? No universal rules!  — Lowest priority process? (Base the decision on importance.) — Smallest process? (Maximize memory utilization by freeing up the minimal amount of space. This also minimizes the swapping cost.) — Largest process? (Minimize the frequency with which we have to perform (the expensive process of) swapping.) — Oldest process? (Get rid of the least interactive process.) It depends on a number of factors such as scheduling, ... Paging Disk


Download ppt "Page Replacement Algorithms"
Ads by Google