5Fixed (Static) Partitions Attempt at multiprogramming using fixed partitionsone partition for each jobsize of partition designated by reconfiguring the systempartitions can’t be too small or too large.Critical to protect job’s memory space.Entire program stored contiguously in memory during entire execution.Internal fragmentation is a problem.
7Table 2.1 : Main memory use during fixed partition allocation of Table 2.1. Job 3 must wait. Job List :J1 30KJ2 50KJ3 30KJ4 25KOriginal StateAfter Job Entry100KJob 1 (30K)Partition 1Partition 1Partition 225KJob 4 (25K)Partition 2Partition 325KPartition 350KJob 2 (50K)Partition 4Partition 4
8Dynamic PartitionsAvailable memory kept in contiguous blocks and jobs given only as much memory as they request when loaded.Improves memory use over fixed partitions.Performance deteriorates as new jobs enter the systemfragments of free memory are created between blocks of allocated memory (external fragmentation).
9Dynamic Partitioning of Main Memory & Fragmentation (Figure 2.2)
10Dynamic Partition Allocation Schemes First-fit: Allocate the first partition that is big enough.Keep free/busy lists organized by memory location (low-order to high-order).Faster in making the allocation.Best-fit: Allocate the smallest partition that is big enoughKeep free/busy lists ordered by size (smallest to largest).Produces the smallest leftover partition.Makes best use of memory.
11First-Fit Allocation Example (Table 2.2) J1 10KJ2 20KJ3 30K*J4 10KMemory Memory Job Job Internallocation block size number size Status fragmentationK J K Busy 20KK J K Busy 5KK J K Busy 30KK FreeTotal Available: 115K Total Used: KJob List
12Best-Fit Allocation Example (Table 2.3) J1 10KJ2 20KJ3 30KJ4 10KMemory Memory Job Job Internallocation block size number size Status fragmentationK J K Busy 5KK J K Busy NoneK J K Busy NoneK J K Busy 40KTotal Available: 115K Total Used: KJob List
13First-Fit Memory Request Assume that a job of size 200 bytes is waiting to be loaded into memory.
15Best-Fit vs. First-Fit Best-Fit More complex algorithm Increases memory useMemory allocation takes less timeIncreases internal fragmentationDiscriminates against large jobsBest-FitMore complex algorithmSearches entire table before allocating memoryResults in a smaller “free” space (sliver)
16Release of Memory Space : Deallocation Deallocation for fixed partitions is simpleMemory Manager resets status of memory block to “free”.Deallocation for dynamic partitions tries to combine free areas of memory whenever possibleIs the block adjacent to another free block?Is the block between 2 free blocks?Is the block isolated from other free blocks?
18Case 2: Joining 3 Free Blocks This slide has an error: The job finishing is at 7580 and has a size of 20 bytes, not 7600 and a size of Assume that the job at 7600 is already free and has a size of 205.The after chart is correct as stated assuming these changes to the before deallocation.
20Relocatable Dynamic Partitions Memory Manager relocates programs to gather all empty blocks and compact them to make 1 memory block.Memory compaction (garbage collection, defragmentation) performed by OS to reclaim fragmented sections of memory space.Memory Manager optimizes use of memory & improves throughput by compacting & relocating.
21Compaction StepsRelocate every program in memory so they’re contiguous.Adjust every address, and every reference to an address, within each program to account for program’s new location in memory.Must leave alone all other values within the program (e.g., data values).
23Contents of relocation register & close-up of Job 4 memory area (a) before relocation & (b) after relocation and compaction (Figure 2.6)Note each job will have its own relocation register value.
24Virtual MemoryVirtual Memory (VM) = the ability of the CPU and the operating system software to use the hard disk drive as additional RAM when needed (safety net)Good – no longer get “insufficient memory” errorBad - performance is very slow when accessing VMSolution = more RAM
25Motivations for Virtual Memory Use Physical DRAM as a Cache for the DiskAddress space of a process can exceed physical memory sizeSum of address spaces of multiple processes can exceed physical memorySimplify Memory ManagementMultiple processes resident in main memory.Each process with its own address spaceOnly “active” code and data is actually in memoryAllocate more memory to process as needed.Provide ProtectionOne process can’t interfere with another.because they operate in different address spaces.User process cannot access privileged informationdifferent sections of address spaces have different permissions.
28DRAM vs. SRAM as a “Cache” DRAM vs. disk is more extreme than SRAM vs. DRAMAccess latencies:DRAM ~10X slower than SRAMDisk ~100,000X slower than DRAMImportance of exploiting spatial locality:First byte is ~100,000X slower than successive bytes on diskvs. ~4X improvement for page-mode vs. regular accesses to DRAMBottom line:Design decisions made for DRAM caches driven by enormous cost of missesSRAMDRAMDisk
29Locating an Object in a “Cache” (cont.) DRAM CacheEach allocate page of virtual memory has entry in page tableMapping from virtual pages to physical pagesFrom uncached form to cached formPage table entry even if page not in memorySpecifies disk addressOS retrieves informationPage Table“Cache”LocationData24317105•0:1:N-1:XObject NameD:J:X:On Disk•1
30A System with Physical Memory Only Examples:most Cray machines, early PCs, nearly all embedded systems, etc.MemoryPhysicalAddresses0:1:CPUN-1:Addresses generated by the CPU point directly to bytes in physical memory
31A System with Virtual Memory Examples:workstations, servers, modern PCs, etc.Memory0:1:N-1:Page TableVirtualAddressesPhysicalAddresses0:1:CPUP-1:DiskAddress Translation: Hardware converts virtual addresses to physical addresses via an OS-managed lookup table (page table)
32Page Faults (Similar to “Cache Misses”) What if an object is on disk rather than in memory?Page table entry indicates virtual address not in memoryOS exception handler invoked to move data from disk into memorycurrent process suspends, others can resumeOS has full control over placement, etc.Before faultAfter faultMemoryMemoryPage TablePage TableVirtualAddressesPhysicalAddressesVirtualAddressesPhysicalAddressesCPUCPUDiskDisk
334TerminologyCache: a small, fast “buffer” that lies between the CPU and the Main Memory which holds the most recently accessed data.Virtual Memory: Program and data are assigned addresses independent of the amount of physical main memory storage actually available and the location from which the program will actually be executed.Hit ratio: Probability that next memory access is found in the cache.Miss rate: (1.0 – Hit rate)
34Importance of Hit Ratio 5Importance of Hit RatioGiven:h = Hit ratioTa = Average effective memory access time by CPUTc = Cache access timeTm = Main memory access timeEffective memory time is:Ta = hTc + (1 – h)TmSpeedup due to the cache is:Sc = Tm / TaExample:Assume main memory access time of 100ns and cache access time of 10ns and there is a hit ratio of .9.Ta = .9(10ns) + (1 - .9)(100ns) = 19nsSc = 100ns / 19ns = 5.26Same as above only hit ratio is now .95 instead:Ta = .95(10ns) + ( )(100ns) = 14.5nsSc = 100ns / 14.5ns = 6.9
35Cache vs Virtual Memory 6Cache vs Virtual MemoryPrimary goal of Cache:increase Speed.Primary goal of Virtual Memory: increase Space.
36Cache Replacement Algorithms 15Cache Replacement AlgorithmsReplacement algorithm determines which block in cache is removed to make room.2 main policies used todayLeast Recently Used (LRU)The block replaced is the one unused for the longest time.RandomThe block replaced is completely random – a counter-intuitive approach.
3716LRU vs RandomBelow is a sample table comparing miss rates for both LRU and Random.CacheSizeMiss Rate:LRURandom16KB4.4%5.0%64KB1.4%1.5%256KB1.1%As the cache size increases there are more blocks to choose from, therefore the choice is less critical probability of replacing the block that’s needed next is relatively low.
38Virtual Memory Replacement Algorithms 17Virtual Memory Replacement Algorithms1) Optimal2) First In First Out (FIFO)3) Least Recently Used (LRU)
3918OptimalReplace the page which will not be used for the longest (future) period of time.Faults are shown in boxes; hits are not shown.7 page faults occur
4019OptimalA theoretically “best” page replacement algorithm for a given fixed size of VM.Produces the lowest possible page fault rate.Impossible to implement since it requires future knowledge of reference string.Just used to gauge the performance of real algorithms against best theoretical.
41FIFO 20 Faults are shown in boxes; hits are not shown. When a page fault occurs, replace the one that was brought in first.Faults are shown in boxes; hits are not shown.9 page faults occur
42FIFO Simplest page replacement algorithm. 21FIFOSimplest page replacement algorithm.Problem: can exhibit inconsistent behavior known as Belady’s anomaly.Number of faults can increase if job is given more physical memoryi.e., not predictable
43Example of FIFO Inconsistency 22Example of FIFO InconsistencySame reference string as before only with 4 frames instead of 3.Faults are shown in boxes; hits are not shown.10 page faults occur
4423LRUReplace the page which has not been used for the longest period of time.Faults are shown in boxes; hits only rearrange stack1255122519 page faults occur
45LRU More expensive to implement than FIFO, but it is more consistent. 24LRUMore expensive to implement than FIFO, but it is more consistent.Does not exhibit Belady’s anomalyMore overhead needed since stack must be updated on each access.
46Example of LRU Consistency 25Example of LRU ConsistencySame reference string as before only with 4 frames instead of 3.Faults are shown in boxes; hits only rearrange stack121254151234251234447 page faults occur
47Servicing a Page Fault Processor Signals Controller (1) Initiate Block ReadProcessor Signals ControllerRead block of length P starting at disk address X and store starting at memory address YRead OccursDirect Memory Access (DMA)Under control of I/O controllerI / O Controller Signals CompletionInterrupt processorOS resumes suspended processProcessorReg(3) Read DoneCacheMemory-I/O bus(2) DMA TransferI/OcontrollerMemorydiskDiskDiskdisk
48Handling Page FaultsMemory reference causes a fault – called a page faultPage fault can happen at any time and placeInstruction fetchIn the middle of an instruction executionSystem must save all stateMove page from disk to memoryRestart the faulting instructionRestore stateBackup PC – not easy to find out by how much – need HW help
49Page FaultIf there is ever a reference to a page, first reference will trap to OS page faultHardware traps to kernelGeneral registers savedOS determines which virtual page neededOS checks validity of address, seeks page frameIf selected frame is dirty, write it to diskOS brings schedules new page in from diskPage tables updatedFaulting instruction backed up to when it beganFaulting process scheduledRegisters restoredProgram continues
50What to Page in Demand paging brings in the faulting page To bring in additional pages, we need to know the futureUsers don’t really know the future, but some OSs have user-controlled pre-fetchingIn real systems,load the initial pageStart runningSome systems (e.g. WinNT will bring in additional neighboring pages (clustering))Demand paging – start with nothing – all PTE’s I=0 Then as execute, get faults as code and data needed ( demanded) by process
51VM Page Replacement If there is an unused page, use it. If there are no pages available, select one (Policy?) andIf it is dirty (M == 1)write it to diskInvalidate its PTE and TLB entryLoad in new page from diskUpdate the PTE and TLB entry!Restart the faulting instructionWhat is cost of replacing a page?How does the OS select the page to be evicted?
52Measuring Demand Paging Performance Page Fault Rate (p)0 < p < (no page faults to every ref is a fault)Page Fault Overhead= fault service overhead + read page + restart process overheadDominated by time to read page inEffective Access Time= (1-p) (memory access) + p (page fault overhead)
53Performance Example Memory access time = 100 nanoseconds Page fault overhead = 25 millisec (msec)Page fault rate = 1/1000EAT = (1-p) * p * (25 msec)= (1-p) * p * 25,000,000= ,999,900 * p= ,999,900 * 1/1000 = 25 microseconds!Want less than 10% degradation110 > ,999,900 * p10 > 24,999,900 * pp < or 1 fault in 2,500,000 accesses!
54Page Replacement Algorithms Want lowest page-fault rate.Evaluate algorithm by running it on a particular string of memory references (reference string) and computing the number of page faults on that string.Reference string – ordered list of pages accessed as process executesEx. Reference String is A B C A B D A D B C B
55The Best Page to Replace The best page to replace is the one that will never be accessed againOptimal Algorithm - Belady’s AlgorithmLowest fault rate for any reference stringBasically, replace the page that will not be used for the longest time in the future.If you know the future, please see me after class!!Belady’s Algorithm is a yardstickWe want to find close approximations
56Page Replacement - FIFO FIFO is simple to implementWhen page in, place page id on end of listEvict page at head of listMight be good? Page to be evicted has been in memory the longest timeBut?Maybe it is being usedWe just don’t knowFIFO suffers from Belady’s Anomaly – fault rate may increase when there is more physical memory!
57FIFO vs. OptimalReference string – ordered list of pages accessed as process executesEx. Reference String is A B C A B D A D B C BOPTIMALA B C A B D A D B C BSystem has 3 page frames5 Faultstoss Ctoss A or DABCDFIFOA B C A B D A D B C Btoss Atoss ?7 faults
58Second Chance Maintain FIFO page list On page fault Check reference bitIf R == 1 then move page to end of list and clear RIf R == 0 then evict page
59Clock Replacement Create circular list of PTEs in FIFO Order One-handed Clock – pointer starts at oldest pageAlgorithm – FIFO, but check Reference bitIf R == 1, set R = 0 and advance handevict first page with R == 0Looks like a clock hand sweeping PTE entriesFast, but worst case may take a lot of timeTwo-handed clock – add a 2nd hand that is n PTEs ahead2nd hand clears Reference bit
60Not Recently Used Page Replacement Algorithm Each page has Reference bit, Modified bitbits are set when page is referenced, modifiedPages are classifiednot referenced, not modifiednot referenced, modifiedreferenced, not modifiedreferenced, modifiedNRU removes page at randomfrom lowest numbered non empty class
61Least Recently Used (LRU) Replace the page that has not been used for the longest time3 Page Frames Reference String - A B C A B D A D B CLRU – 5 faultsA B C A B D A D B C
62LRU Past experience may indicate future behavior Perfect LRU requires some form of timestamp to be associated with a PTE on every memory reference !!!Counter implementationEvery page entry has a counter; every time page is referenced through this entry, copy the clock into the counter.When a page needs to be changed, look at the counters to determine which are to changeStack implementation – keep a stack of page numbers in a double link form:Page referenced: move it to the topNo search for replacement
63LRU Approximations Aging Clock replacement Keep a counter for each PTE Periodically – check Reference bitIf R == 0 increment counter (page has not been used)If R == 1 clear the counter (page has been used)Set R = 0Counter contains # of intervals since last accessReplace page with largest counter valueClock replacement
64Contrast: Macintosh Memory Model MAC OS 1–9Does not use traditional virtual memoryAll program objects accessed through “handles”Indirect reference through pointer tableObjects stored in shared global address spaceP1 Pointer TableP2 Pointer TableProcess P1Process P2Shared Address SpaceABCDE“Handles”
65Macintosh Memory Management Allocation / DeallocationSimilar to free-list management of malloc/freeCompactionCan move any object and just update the (unique) pointer in pointer tableP1 Pointer TableP2 Pointer TableProcess P1Process P2Shared Address SpaceABCDE“Handles”
66Mac vs. VM-Based Memory Mgmt Allocating, deallocating, and moving memory:can be accomplished by both techniquesBlock sizes:Mac: variable-sizedmay be very small or very largeVM: fixed-sizesize is equal to one page (4KB on x86 Linux systems)Allocating contiguous chunks of memory:Mac: contiguous allocation is requiredVM: can map contiguous range of virtual addresses to disjoint ranges of physical addressesProtectionMac: “wild write” by one process can corrupt another’s data
67MAC OS X “Modern” Operating System Based on MACH OS Virtual memory with protectionPreemptive multitaskingOther versions of MAC OS require processes to voluntarily relinquish controlBased on MACH OSDeveloped at CMU in late 1980’s
75Page Replacement Policy Working Set:Set of pages used actively & heavilyKept in memory to reduce Page FaultsSet is found/maintained dynamically by OSReplacement: OS tries to predict which page would have least impact on the running programCommon Replacement Schemes:Least Recently Used (LRU)First-In-First-Out (FIFO)
76Page Replacement Policies Least Recently Used (LRU)Generally works wellTROUBLE:When the working set is larger than the Main MemoryWorking Set = 9 pagesPages are executed in sequence(08 (repeat))THRASHING
77Page Replacement Policies First-In-First-Out(FIFO)Removes Least Recently Loaded pageDoes not depend on UseDetermined by number of page faults seen by a page
78Page Replacement Policies Upon ReplacementNeed to know whether to write data backAdd a Dirty-BitDirty Bit = 0; Page is clean; No writingDirty Bit = 1; Page is dirty; Write back