Presentation on theme: "Operating Systems Prof. Navneet Goyal"— Presentation transcript:
1 Operating Systems Prof. Navneet Goyal Department of Computer Science & Information SystemsBITS, Pilani
2 Topics for Today Concept of Paging Logical vs. Physical Addresses Address TranslationPage TablesPage Table ImplementationHierarchical PagingHashed Page TablesInverted Page TablesTranslation Look aside Buffers (TLBs)Associative MemoryASID
3 Paging Memory is divided into fixed size chunks; FRAMES Process is divided into fixed size chunks;PAGESA frame can hold one page of dataPhysical address space of a process need not be contiguousVery limited internal fragmentationNo external fragmentation
6 Page Table Non-contiguous allocation of memory for D Will base address register suffice?Page TableWithin programEach logical address consists of a page number and an offset within the pageLogical to Physical Address translation is done by processor hardware(Page no, offset) (frame no, offset)
7 Address Translation Scheme Address generated by CPU is divided into:Page number (p) – used as an index into a page table which contains base address of each page in physical memory.Page offset (d) – combined with base address to define the physical memory address that is sent to the memory unit.
13 Page Table Implementation Small PTs (up to 256 entries)Can be stored in RegistersExample DEC PDP-11 (16 bit address & 8KB page size)Big PTs (1M entries) are stored in MMPage Table Base Register (PRTR) points to PT2 memory access problemHardware Solution – Translation Look-Aside Buffers (TLBs)
14 Page Table StructureModern systems have large logical address space (232 – 264)Page table becomes very largeOne page table per process232 logical-address space & page-size 4KBPage table consists of 1 mega entriesEach entry 4 bytesMemory requirements of page table = 4MB
15 Page Table Implementation Page table is kept in main memoryPage-table base register (PTBR) points to the page tablePage-table length register (PRLR) indicates size of the page tableIn this scheme every data/instruction access requires two memory accesses. One for the page table and one for the data/instruction
17 Page Table Implementation Each virtual memory reference can cause two physical memory accessesone to fetch the page tableone to fetch the dataTo overcome this problem a high-speed cache is set up for page table entriescalled the TLB - Translation Lookaside Buffer
18 TLBFast-lookup hardware cache called associative memory or translation look- aside buffers (TLBs)Some TLBs store address-space identifiers (ASIDs) in each TLB entry – uniquely identifies each process to provide address- space protection for that processIf ASIDs not supported by TLB?
19 TLB & ASID Address-space protection While resolving virtual page no., it ensures that the ASID for the currently running process matches the ASID associated with the virtual pageWhat if ASID does not match?Attempt is treated as a TLB miss
20 TLB & ASID ASID allows entries for different processes to exist in TLB If ASID not supported?With each context switch, TLB must be flushed
21 TLB Contains page table entries that have been most recently used Functions same way as a memory cacheTLB is a CPU cache used by memory management hardware to speed up virtual to logical address translationIn TLB, virtual address is the search key and the search result is a physical addressTypically a content addressable memory
22 Associative Memory/Mapping Also referred to as Content-addressable memory (CAM)Special type of memory used in special type of high speed searching applicationsContent-addressable memory is often used in computer networking devices. For example, when a network switch receives a Data Frame from one of its ports, it updates an internal table with the frame's source MAC address and the port it was received on. It then looks up the destination MAC address in the table to determine what port the frame needs to be forwarded to, and sends it out that port. The MAC address table is usually implemented with a binary CAM so the destination port can be found very quickly, reducing the switch's latencyProcessor is equipped with HW that allows it to interrogate simultaneously a number of TLB entries to find if a search key exists
23 Associative Memory/Mapping Unlike standard computer memory (RAM) in which the user supplies a memory address and the RAM returns the data word stored at that addressCAM is designed such that the user supplies a data word and the CAM searches its entire memory to see if that data word is stored anywhere in itIf the data word is found, the CAM returns a list of one or more storage addresses where the word was found (and in some architectures, it also returns the data word, or other associated pieces of data)Thus, a CAM is the hardware embodiment of what in software terms would be called an associative array
24 TLB Associative High-speed memory Between 64-1024 entries TLB Hit & TLB MissPage #Frame #Address translation (A´, A´´)-If A´ is in associative register, get frame # out.-Otherwise get frame # from page table in memory
27 TLB TLB Miss – Reference PT Add page # and frame # to TLB If TLB full – select one for replacementLeast Recently Used (LRU) or randomWired down entriesTLB entries for kernel code are wired down
28 Effective Access Time Hit-Ratio Search time for TLB = 20 ns Time for memory access = 100 nsTotal Time (in case of hit) = 120 nsIn case of miss, Total time= =220Effective memory access time (80% hit ratio)= .8* *220 = 140 ns40% slowdown in memory access timeFor 98% hit-ratio – slowdown = 22%
29 Effective Access Time Associative Lookup = time unit Assume memory cycle time is 1 microsecondHit ratio – percentage of times that a page number is found in the associative registersHit ratio = Effective Access Time (EAT)EAT = (1 + ) + (2 + )(1 – )= 2 + –
31 Hierarchical Paging 32-bit address line with 4Kb page size Logical address20-bit page no bit page offsetDivide the page table into smaller piecesPage no. is divided into 10-bit page no. & 10- bit page offsetPage table is itself “paged”“Forward-Mapped” page tableUsed in x86 processor familypage numberpage offsetpip2d
34 VAX Architecture for Paging Variation of 2-level paging32-bit addressing with 512 bytes pageLAS is divided into 4 Sections of 230 bytes each2 high order bits for segmentNext 21 bits for page # of that section9 bits for offset in the desired page
35 Address-Translation Scheme Address-translation scheme for a two-level 32-bit paging architecture
36 Limitations of 2-Level paging Consider 64-bit LAS with page size 4KBEntries in PT = 2522-level paging (42,10,12)Outer PT contains 242 entries3-level paging (32,10,10,12)Outer PT still contains 232 entriesSPARC with 32-bit addressing4-level Paging??Motorola with 32-bit addressing
37 Page Size Important HW design decision Several factors considered Internal fragmentationSmaller page size, less amount of internal fragmentationSmaller page size, more pages required per processMore pages per process means larger page tablesPage faults and Thrashing!
39 Page Replacement Demand Paging Page faults Memory is full
40 Demand Paging Bring a page into memory only when it is needed. Less I/O neededLess memory neededFaster responseMore usersPage is needed reference to itinvalid reference abortnot-in-memory bring to memory
41 Demand PagingVirtual Memory Larger than Physical Memory
42 Demand PagingTransfer of a Paged Memory to Contiguous Disk Space
44 Page FaultIf there is ever a reference to a page, first reference will trap to OS page faultOS looks at an internal table (in PCB of process) to decide:Invalid reference abort.Just not in memory.Get empty frame.Swap page into frame.Reset tables, validation bit = 1.Restart instruction that was interrupted
45 No Free Frames?Page replacement – find some page in memory, but not really in use, swap it out.algorithmperformance – want an algorithm which will result in minimum number of page faults.Same page may be brought into memory several times
46 Page ReplacementPrevent over-allocation of memory by modifying page-fault service routine to include page replacement.Use modify (dirty) bit to reduce overhead of page transfers – only modified pages are written to disk.Page replacement completes separation between logical memory and physical memory – large virtual memory can be provided on a smaller physical memory.
47 Basic Page Replacement Find the location of the desired page on disk.Find a free frame: - If there is a free frame, use it. - If there is no free frame, use a page replacement algorithm to select a victim frame.Read the desired page into the (newly) free frame. Update the page and frame tables.Restart the process.
49 Page Replacement Algorithms Graph of Page Faults Versus The Number of Frames
50 Page Replacement Algorithms Want lowest page-fault rate.Evaluate algorithm by running it on a particular string of memory references (reference string) and computing the number of page faults on that string.In all our examples, the reference string is1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5.
51 FIFO Algorithm Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 3 Frames (3 pages can be in memory at a time per process)4 Frames114522139 page faults33241154221510 page faults332443
54 Optimal Algorithm Lowest page-fault rate of all algorithms Never suffer from Belady’s anomalyReplace page that will not be used for longest period of time.4 frames example1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5How do you know this?Used for measuring how well your algorithm performs.Difficult to implement as it requires prior knowledge of reference string (like SJF in CPU Scheduling)Mainly used for comparison studies1426 page faults345
56 LRU Algorithm Optimal algorithm not feasible LRU, an approx. to Optimal is feasibleFIFO – looks back in timeOptimal – looks forward in timeReference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 515Replaces the page that has not been used for the longest period of time!!!28 page faults35443SchemePF#FIFO10LRU08OPT06Recent past as an approx. of the near future!!!
58 Some Interesting results Reference string SSR is the reverse string of SPFOPT (S) = PFOPT (SR)PFLRU (S) = PFLRU (SR)
59 LRU Algorithm Every process has a logical counter Counter ImplementationEvery process has a logical counterCounter is incremented every time a page is referencedCounter value is copied into PTE for that page“Time” of last reference of each page is recordedSearch the PT for LRU page (with smallest time value)A write to memory for each memory accessAny Problems?Counter Overflow
60 LRU Algorithm Stack implementation keep a stack of page numbers in a double link form:Page referenced:move it to the toprequires 6 pointers to be changed (worst)No search for replacement
62 LRU Algorithm Considerable HW support is needed Beyond TLB registers Updating of the time fields or stack must be done for every memory referenceUse interrupts for every memory reference?Few systems provide HW support for true LRU
63 Frame Locking Some of the frames in MM may be locked Locked frame can not be replacedKernel is held in locked framesKey control structuresI/O buffers and other time critical areas are also lockedLock bit is associated with each frameIn frame table as well in the current page table
64 Replacement Algorithms FIFOcircular buffer, round robin style (past)OPTIMALtime to next reference is longest (future)LRUpage that has not been referenced for the longest timeCLOCK
65 Swapper vs. PagerDemand paging is similar to paging system with swapping where processes reside on diskTo execute, we swap in into memoryRather than swapping the entire process into memory, we use a LAZY SWAPPERLazy swapper never swaps a page into memory unless that page will be neededUsing swapper is incorrect with PagingSwapper manipulates entire process, whereas pager is concerned with individual pagesWith demand paging, we use the term PAGER rather than swapper.
66 Memory ProtectionMemory protection implemented by associating protection bit with each frame Valid-invalid bit attached to each entry in the page table:“valid” indicates that the associated page is in the process’ logical address space, and is thus a legal page“invalid” indicates that the page is not in the process’ logical address space
68 Comparison FIFO Simple to implement Performs relatively poorly LRU does nearly as well as optimalDifficult to implementImposes significant overheadsSolution?Approximate LRU algorithms!approx. the performance of LRU with less overheadsVariants of scheme called CLOCK POLICY
69 Basic Replacement Algorithms Clock PolicyAdditional bit called a use bitWhen a page is first loaded in memory, the use bit is set to 1When the page is referenced, the use bit is set to 1When it is time to replace a page, the first frame encountered with the use bit set to 0 is replaced.During the search for replacement, each use bit set to 1 is changed to 0Used in Multics OS
73 Second chance Algorithm The page replacement algorithm just described is called the second-chance algorithmWHY?Enhanced second-chance algorithm
74 Enhanced Second chance Algorithm Second-chance algorithm can be made more powerful by increasing the number of bits it employsIf no. of bits employed is 0?In all processors that support paging, a modify or dirty bit is associated with every frame in memory
75 Enhanced Second chance Algorithm Each frame falls into one of the 4 categories:0 01 0 (not modified)0 1(modified)1 1
76 Enhanced Second chance Algorithm Beginning at the current position of the pointer, scan the buffer. Make no changes to use bit. Frame first encountered with 0 0 is replacedIf 1 fails, scan again, looking for frame with 0 1. Replace first such frame. During scan, set the use bit to 0If 2 fails, pointer returns to the starting position, & all the frames will have use bit set to 0. Repeat step 1 and, if necessary, step 2Used in Macintosh VM scheme