Presentation on theme: "CMPT 300: Operating Systems I Ch 9: Virtual Memory Dr. Mohamed Hefeeda"— Presentation transcript:
1 CMPT 300: Operating Systems I Ch 9: Virtual Memory Dr. Mohamed Hefeeda School of Computing ScienceSimon Fraser UniversityCMPT 300: Operating Systems ICh 9: Virtual MemoryDr. Mohamed Hefeeda
2 ObjectivesUnderstand virtual memory system, its benefits, and mechanisms that make it feasible:Demand pagingPage-replacement algorithmsFrame allocationLocality and working-set modelsUnderstand how the kernel memory is allocated and used
3 BackgroundVirtual memory – separation of user logical memory from physical memoryOnly part of the program needs to be in memory for executionLogical address space can therefore be much larger than physical address spaceAllows address spaces to be shared by several processesAllows for more efficient process creationVirtual memory can be implemented viaDemand pagingDemand segmentation
4 Virtual Memory Can be Much Larger than Physical Memory
5 Demand Paging The core enabling idea of virtual memory systems: Why? A page is brought into memory only when neededWhy?Less I/O neededFaster response, because we load only first few pagesLess memory needed for each process More processes can be admitted to the systemHow?Process generates logical (virtual) addresses which are mapped to physical addresses using a page tableIf the requested page is not in memory, kernel brings it from hard diskHow do we know whether a page is in memory?
6 Valid-Invalid Bit Each page table entry has a valid–invalid bit: v in-memory,i not-in-memoryInitially, it is set to i for all entriesDuring address translation, if valid–invalid bit is i, then it could be:Illegal reference (outside process’ address space) abort processLegal reference but not in memory page fault (bring the page from disk)vi….Frame #valid-invalid bitpage table
7 Handling Page Fault OS looks at another table to decide: Invalid reference abort processJust not in memory bring page itFind a free frameSwap page from disk to frame (I/O operation)Reset page table, set validation bit = vRestart the instruction that caused page fault
8 Handling a Page Fault (cont’d) Restarting an instruction: e.g., C A + Bassume page fault when accessing C (after adding A and B)bring page that has C in memory (I/O process may be suspended)fetch ADD instruction (again)fetch A, B (again)do the addition (again)then store in CRestarting an instruction can be complicated, e.g.,MVC (Move Character) instruction in IBM 360/370 systemsCan move up to 256 bytes from one location to another, possibly overlappingPage fault may occur in the middle of copying some data may be overwritten simply restarting instruction is not enough (data has been modified)Solution: hardware attempts to access both ends of both blocks; if any is not in memory, a page fault occurs before executing instructionBottom line: demanding paging may raise subtle problems and they must be addressed
9 Performance of Demand Paging Page Fault Rate 0 p 1.0if p = 0 means no page faultsif p = 1, means every reference is a faultEffective Access Time (EAT)?EAT = (1 – p) x memory access time+ p x (page fault time)Page fault time = service page-fault interrupt (~microseconds)+ read in requested page (~milliseconds)+ restart process (~microseconds)Note: reading in requested page may require writing another page to disk if there is no free frame
10 Demand Paging: Example Memory access time = 200 nanosecondsAverage page fault time = 8 milliseconds(disk latency, seek and transfer time)EAT = (1 – p) x p (8 milliseconds)= (1 – p) x p x 8,000,000= p x 7,999,800 (nanosecond)If one out of every 1,000 memory references causes a page fault (p = 0.001), then: EAT = 8200 nanoseconds.This is a slowdown by a factor of 40!!!Bottom line: We should minimize number of page faults; they are very very costly
11 Virtual Memory and Process Creation VM allows faster/efficient process creation usingCopy-on-Write (COW) techniqueCOW allows both parent and child processes to initially share the same pages in memory (during fork())If either process modifies a shared page, page is copiedWhen P1 tries to modify page CCopy of C
12 Page Replacement: Overview Page fault occurs need to bring requested page from disk to memory:Find location of the requested page on diskFind a free frame:If there is a free frame, use itElse use page replacement algorithm to select a victim page to evict from memoryBring requested page into the free frameUpdate the page table and free frame listRestart the process
13 Page Replacement: Overview (cont’d) Note:we can save swap out overhead if victim page was NOT modified significant savings (I/O operation)We associate a dirty (modify) bit with each page to indicate whether a page has been modifiedHow do we choose the victim page? What would be the goal of this selection algorithm?
14 Page Replacement Algorithms Objective: minimize page-fault rateAlgorithm evaluationTake a particular string of memory references, andCompute number of page faults on that stringThe reference string looks like:1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5NotesWe use page numbersThe address sequence could have been: 100, 250, 270, 301, 490, …, Assuming a page of size 100 bytes.References 250 and 270 are in the same page (2); only the first one may cause a page fault. It is why we mention 2 only once
15 Page Faults vs. Number of Frames We expect number of page faults decreases as number of physical frames allocated to process increases
16 Page Replacement: First-In-First-Out (FIFO) Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 53 frames (3 pages can be in memory at any time)Let us work it outOn every page fault, we show memory contentsNumber of page faults: 9ProsEasy to understand and implementConsPerformance may not always be good:It may replace a page that is used heavily (e.g., one that has a variable which is accessed most of the time)It suffers from Belady’s anomaly
17 FIFO: Belady’s Anomaly Assume reference string:1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5If we have 3 frames, how many page faults?9 page faultsIf we have 4 frames, how many page faults?10 page faultsMore frames are supposed to result in fewer page faults!Belady’s Anomaly: more frames more page faults
19 Optimal Page Replacement Algorithm Can you guess the optimal replacement algorithm?Replace page that will not be used for longest period of time4-frame example: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 56 page faultsHow can we know the future? We cannot!Used for comparing algorithms12345
20 Least Recently Used (LRU) Algorithm Try to approximate Optimal policy: look at past to infer futureLRU: Replace page that has not been used for longest periodRationale: this page may not be needed anymore (e.g., pages of initialization module)4-frame example: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 58 page faults (compare to: optimal 6, FIFO 10)LRU and Optimal do not suffer from Belady’s anomaly12341254125312435243
21 LRU Implementation (1): Counters Every page-table entry has a time-of-use (counter) fieldWhen page is referenced, copy CPU logical clock into this fieldCPU clock is maintained in a register and incremented with every memory accessNeed to replace a page, search for the page with smallest (oldest) valueCons:search time, updating the time-of-use fields (writing to memory!), clock overflowNeed hardware support (increment clock and update time-of-use field)
22 LRU Implementation (2): Stack Keep stack of page numbers in a doubly-linked listIf a page is referenced, move it to the topThe least recently used page sinks to the bottomConsEach memory reference is a bit expensive (requires updating 6 pointers in worst case)ProsNo search for replacementAlso needs hardware support to update the stackafter
23 LRU Implementation (cont’d) Can we implement LRU without hardware support?Say by using interrupts, i.e., when hardware needs to update stack or counters, it issues an interrupt and an ISR does the update?NO. Too costly, it will slow every memory reference by a factor of at least 10Even LRU (which tries to approximate OPT) is not easy to implement without hardware support!
24 Second-chance (Clock) Replacement An approximation of LRU, aka Clock replacementEach page has a reference bit (ref_bit), initially = 0When page is referenced, ref_bit is set to 1 (by hardware)Maintain a moving pointer to next (candidate) victimWhen choosing a page to replace, check ref_bit of victim:if ref_bit == 0, replace itelse set ref_bit to 0leave page in memory (give it another chance),move pointer to next page,repeat till a victim is found
26 Counting Replacement Algorithms Keep a counter of number of references that have been made to each pageLFU (Least Frequently Used) Algorithm: replace page with smallest countArgument: page with smallest count is not used oftenProblem: some pages were heavily used at earlier time, but are no longer needed, will stay in (and waste) memoryMFU (Most Frequently Used) Algorithm: replace page with highest countArgument: page with the smallest count was probably just brought in and is yet to be usedProblem: consider a code that uses a module or a subroutine heavily, MFU will consider it a good candidate for eviction!
27 Counting Replacement Algorithms (cont’d) LFU vs. MFUConsider the following example:A database code that reads many pages then processes themWhich policy (LFU or MFU) would perform better?MFU: Even though the read module accumulated large frequency, we need to evict its pages during processing
28 Allocation of Frames Each process needs a minimum number of pages Defined by the computer architecture (hardware)instruction width and number of address indirection levelsConsider an instruction that takes one operand and allows one level of indirection. What is the minimum number of frames needed to execute it?load [addr]Answer: 3 (load is in a page, addr is in another, [addr] is in a third)Note: Maximum number of frames allocated to a process is determined by the OS
29 Frame AllocationEqual allocation: All processes get the same number of framesm frames, n processes each process gets m/n framesProportional allocation: Allocate according to the size of processPriority: Use proportional allocation using priorities rather than size
30 Global vs. Local Frame Replacement If a page fault occurs and there is no free frame, we need to free one. Two ways:Global replacementProcess selects a replacement frame from the set of all frames; one process can take a frame from anotherCommonly used in operating systemsProsBetter throughput (process can use any available frame)ConsA process cannot control its own page-fault rate
31 Global vs. Local Frame Replacement (cont’d) Local replacementEach process selects from only its own set of allocated framesProsEach process has its own share of frames; not impacted by the paging behavior of othersConsA process may suffer from high page-fault rate even though there are lightly used frames allocated to other processes
32 ThrashingWhat happens if a process does not have “enough” frames to maintain its active set of pages in memory?Page-fault rate is very high. This leads to:low CPU utilization, whichmakes the OS think that it needs to increase the degree of multiprogramming, thusOS admits another process to the system (making it worse!)Thrashing a process is busy swapping pages in and out more than executing
34 Thrashing (cont’d)To prevent thrashing, we should provide each process with as many frames as it needsHow do we know how many frames a process actually needs?A program is usually composed of several functions or modulesWhen executing a function, memory references are made to instructions and local variables of that function and some global variablesSo, we may need to keep in memory only the pages needed to execute the functionAfter finishing a function, we execute another. Then, we bring in pages needed by the new functionThis is called the Locality Model
35 Locality Model The Locality Model states that Notes As a process executes, it moves from locality to locality, where a locality is a set of pages that are actively used togetherNoteslocality is not restricted to functions/modules; it is more general. It could be a segment of code in a function, e.g., loop touching data/instructions in several pagesLocalities may overlapLocality is a major reason behind the success of demand pagingHow can we know the size of a locality?Using the Working-Set model
36 Working-Set ModelThe set of pages in the most recent memory references is called the working setWS is a moving window : At each reference, a new reference is added at one end, and another is dropped off the otherExample: = 10Size of WS at t1 is 5 pages,And at t2 is 2 pages
37 Working-Set Model (cont’d) Accuracy of WS model depends on choosing if is too small, it will not encompass entire localityif is too large, it will encompass several localitiesif = it will encompass entire programUsing WS modelOS monitors the WS of each processIt allocates number of frames = WS size to that processIf we have more memory frames available, another process can be started
38 Keeping Track of the Working Set Maintaining the entire window set is costlySolution:Approximate with interval timer + a reference bit(ref_bit is set 1 when a page is referenced)Example: = 10,000 memory referencesTimer interrupts every 5,000 memory referencesKeep in memory 2 bits for each pageWhenever a timer interrupts, copy ref_bit into memory bits, then reset ref_bitUpon page fault, check the three bits (ref_bit, 2 in-memory)If any of them is 1 page was used in the last 10,000 to 15,000 references put the page in working setWhy is this not completely accurate?We do not know where, within an interval of 5,000, a reference occurredImprovement 10 in-memory bits and interrupt every 1000 time units
39 Thrashing Control Using WS Model WSSi the working set size of process PiTotal number of pages referenced in the most recent m memory size in framesD = WSSi total demand in framesif D > m ThrashingPolicy: if D > m, then suspend one of the processesBut, maintaining WS is costly. Is there an easier way to control thrashing?
40 Thrashing Control Using Page-Fault Rate Monitor page-fault rate and increase/decrease allocated frames accordinglyEstablish “acceptable” page-fault rate range (upper and lower bounds)If actual rate too low, process loses frameIf actual rate too high, process gains frame
41 Allocating Kernel Memory Treated differently from user memory, why?Kernel requests memory for structures of varying sizesProcess descriptors (PCB), semaphores, file descriptors, …Some of them are less than a pageSome kernel memory needs to be contiguoussome hardware devices interact directly with physical memory without using virtual memoryVirtual memory may just be too expensive for the kernel (cannot afford a page fault)Often, a free-memory pool is dedicated to kernel from which it allocates the needed memory using:Buddy system, orSlab allocation
42 Buddy SystemAllocates memory from fixed-size segment consisting of physically-contiguous pagesMemory allocated using power-of-2 sizesSatisfies requests in units sized as power of 2Request rounded up to next highest power of 2Fragmentation: 17 KB request will be rounded to 32 KB!When smaller allocation needed than is available, current chunk split into two buddies of next-lower power of 2Continue until appropriate sized chunk availableAdjacent “buddies” are combined (or coalesced) together to form a large segmentUsed in older Unix/Linux systems
44 Slab Allocator Slab allocator Creates caches, each consisting of one or more slabsSlab is one or more physically contiguous pagesSingle cache for each unique kernel data structureEach cache is filled with objects – instantiations of the data structureObjects are initially marked as freeWhen structures stored, objects marked as usedBenefitsFast memory allocation, no fragmentationUsed in Solaris, Linux
46 VM and Memory-Mapped Files VM enables mapping a file to memory address space of a processHow?A page-sized portion of the file is read from the file system into a physical frameSubsequent reads/writes to/from file are treated as ordinary memory accessesExample: mmap() on Unix systemsWhy?I/O operations (e.g., read(), write()) on files are treated as memory accesses Simplifies file handling (simpler code)More efficient: memory accesses are less costly than I/O system callsOne way of implementing shared memory for inter-process communication
47 Memory-Mapped Files and Shared Memory Memory-mapped files allow several processes to map the same file Allowing pages in memory to be sharedWin XP implements shared memory using this technique
48 VM Issues: Page size and Pre-paging Page size selection impactsfragmentationpage table sizeI/O overheadlocalityPre-pagingBring to memory some (or all) of the pages a process will need, before they are referencedTradeoffReduce number of page faults at process startupBut, may waste memory and I/O because some of the prepaged pages may not be used
49 VM Issues: Program Structure int data ;Each row is stored in one page; allocated frames <128How many page faults in each of the following programs?Program 1for (j = 0; j < 128; j++) for (i = 0; i < 128; i++) data[i][j] = 0;#page faults: 128 x 128 = 16,384Program 2for (i = 0; i < 128; i++) for (j = 0; j < 128; j++) data[i][j] = 0;#page faults: 128
50 VM Issues: I/O interlock Example ScenarioA process allocates buffer for I/O request (in its own address space)The process issues I/O request and waits (blocks) for itMeanwhile, CPU is given to another process, which makes page faultThe (global) replacement algorithm chooses the page that contains the buffer as a victim!Later, the I/O device sends an interrupt signaling the request is readyBUT the frame that contains buffer is now used by different process!SolutionsLock the (buffer) page in memory (I/O Interlock)Make I/O in kernel memory (not in user memory): data is first transferred to kernel buffers then copied to user spaceNote: page locking can be used in other situations as well, e.g., kernel pages are locked in memory
51 OS Example: Windows XP Uses demand paging with clustering Clustering brings in pages surrounding the faulting pageProcesses are assignedworking set minimum: guaranteed #pages in memoryworking set maximum: maximum #pages in memoryWorking set trimmingIf free memory in the system falls below a threshold, the remove pages from processes that have more than their working set minimums
52 SummaryVirtual memory: A technique to map a large logical address space onto a smaller physical address spaceUses demand paging: bring pages into memory when neededAllows: running large programs in small physical memory, page sharing, efficient process creation, and simplifies programmingPage fault: occurs when a referenced page is not in memoryPage replacement algorithms: FIFO, OPT, LRU, second-chance, …Frame allocation: proportional, global and local page replacementThrashing: process does not have sufficient frames too many page faults poor CPU utilizationLocality and working-set modelsKernel memory: buddy system, slab allocatorMany issues and tradeoffs: page size, pre-paging, I/O interlock, ….