Operating Systems Prof. Navneet Goyal Department of Computer Science & Information Systems BITS, Pilani
Topics for Today Concept of Paging Logical vs. Physical Addresses Address Translation Page Tables Page Table Implementation Hierarchical Paging Hashed Page Tables Inverted Page Tables Translation Look aside Buffers (TLBs) Associative Memory ASID
Paging Memory is divided into fixed size chunks; FRAMES Process is divided into fixed size chunks;PAGES A frame can hold one page of data Physical address space of a process need not be contiguous Very limited internal fragmentation No external fragmentation
Paging 1 2 3 4 5 6 7 8 9 10 11 12 13 14 A0 A1 A2 A3 A0 A1 A2 A3 B0 B1 B2
Paging 1 2 3 4 5 6 7 8 9 10 11 12 13 14 A0 A1 A2 A3 B0 B1 B2 C0 C1 C2 C3 A0 A1 A2 A3 C0 C1 C2 C3 A0 A1 A2 A3 D0 D1 D2 C0 C1 C2 C3 D3 D4
Page Table Non-contiguous allocation of memory for D Will base address register suffice? Page Table Within program Each logical address consists of a page number and an offset within the page Logical to Physical Address translation is done by processor hardware (Page no, offset) (frame no, offset)
Address Translation Scheme Address generated by CPU is divided into: Page number (p) – used as an index into a page table which contains base address of each page in physical memory. Page offset (d) – combined with base address to define the physical memory address that is sent to the memory unit.
Address Translation Architecture
Page Tables
Paging Example Page 0 Page 2 Page 1 Page 3 1 2 3 4 5 6 7 Page 0 Page 1 1 2 3 4 5 6 7 Page 0 Page 1 Page 2 Page 3 1 2 3 1 4 3 7 Logical Memory Page table Physical Memory
32-byte Memory with 4 byte pages Paging Example 32-byte Memory with 4 byte pages
Free Frames Before allocation After allocation
Page Table Implementation Small PTs (up to 256 entries) Can be stored in Registers Example DEC PDP-11 (16 bit address & 8KB page size) Big PTs (1M entries) are stored in MM Page Table Base Register (PRTR) points to PT 2 memory access problem Hardware Solution – Translation Look-Aside Buffers (TLBs)
Page Table Structure Modern systems have large logical address space (232 – 264) Page table becomes very large One page table per process 232 logical-address space & page-size 4KB Page table consists of 1 mega entries Each entry 4 bytes Memory requirements of page table = 4MB
Page Table Implementation Page table is kept in main memory Page-table base register (PTBR) points to the page table Page-table length register (PRLR) indicates size of the page table In this scheme every data/instruction access requires two memory accesses. One for the page table and one for the data/instruction
Page Table Implementation Each virtual memory reference can cause two physical memory accesses one to fetch the page table one to fetch the data To overcome this problem a high-speed cache is set up for page table entries called the TLB - Translation Lookaside Buffer
TLB Fast-lookup hardware cache called associative memory or translation look- aside buffers (TLBs) Some TLBs store address-space identifiers (ASIDs) in each TLB entry – uniquely identifies each process to provide address- space protection for that process If ASIDs not supported by TLB?
TLB & ASID Address-space protection While resolving virtual page no., it ensures that the ASID for the currently running process matches the ASID associated with the virtual page What if ASID does not match? Attempt is treated as a TLB miss
TLB & ASID ASID allows entries for different processes to exist in TLB If ASID not supported? With each context switch, TLB must be flushed
TLB Contains page table entries that have been most recently used Functions same way as a memory cache TLB is a CPU cache used by memory management hardware to speed up virtual to logical address translation In TLB, virtual address is the search key and the search result is a physical address Typically a content addressable memory
Associative Memory/Mapping Also referred to as Content-addressable memory (CAM) Special type of memory used in special type of high speed searching applications Content-addressable memory is often used in computer networking devices. For example, when a network switch receives a Data Frame from one of its ports, it updates an internal table with the frame's source MAC address and the port it was received on. It then looks up the destination MAC address in the table to determine what port the frame needs to be forwarded to, and sends it out that port. The MAC address table is usually implemented with a binary CAM so the destination port can be found very quickly, reducing the switch's latency Processor is equipped with HW that allows it to interrogate simultaneously a number of TLB entries to find if a search key exists
Associative Memory/Mapping Unlike standard computer memory (RAM) in which the user supplies a memory address and the RAM returns the data word stored at that address CAM is designed such that the user supplies a data word and the CAM searches its entire memory to see if that data word is stored anywhere in it If the data word is found, the CAM returns a list of one or more storage addresses where the word was found (and in some architectures, it also returns the data word, or other associated pieces of data) Thus, a CAM is the hardware embodiment of what in software terms would be called an associative array
TLB Associative High-speed memory Between 64-1024 entries TLB Hit & TLB Miss Page # Frame # Address translation (A´, A´´) -If A´ is in associative register, get frame # out. -Otherwise get frame # from page table in memory
TLB
TLB Perform page replacement
TLB TLB Miss – Reference PT Add page # and frame # to TLB If TLB full – select one for replacement Least Recently Used (LRU) or random Wired down entries TLB entries for kernel code are wired down
Effective Access Time Hit-Ratio Search time for TLB = 20 ns Time for memory access = 100 ns Total Time (in case of hit) = 120 ns In case of miss, Total time= 20+100+100=220 Effective memory access time (80% hit ratio) = .8*120 + 0.4*220 = 140 ns 40% slowdown in memory access time For 98% hit-ratio – slowdown = 22%
Effective Access Time Associative Lookup = time unit Assume memory cycle time is 1 microsecond Hit ratio – percentage of times that a page number is found in the associative registers Hit ratio = Effective Access Time (EAT) EAT = (1 + ) + (2 + )(1 – ) = 2 + –
Page Table Structure Hierarchical Paging Hashed Page Tables Inverted Page Tables
Hierarchical Paging 32-bit address line with 4Kb page size Logical address 20-bit page no. + 12-bit page offset Divide the page table into smaller pieces Page no. is divided into 10-bit page no. & 10- bit page offset Page table is itself “paged” “Forward-Mapped” page table Used in x86 processor family 10 10 12 page number page offset pi p2 d
Two-Level Page-Table Scheme
Two-Level Scheme for 32-bit Address
VAX Architecture for Paging Variation of 2-level paging 32-bit addressing with 512 bytes page LAS is divided into 4 Sections of 230 bytes each 2 high order bits for segment Next 21 bits for page # of that section 9 bits for offset in the desired page
Address-Translation Scheme Address-translation scheme for a two-level 32-bit paging architecture
Limitations of 2-Level paging Consider 64-bit LAS with page size 4KB Entries in PT = 252 2-level paging (42,10,12) Outer PT contains 242 entries 3-level paging (32,10,10,12) Outer PT still contains 232 entries SPARC with 32-bit addressing 4-level Paging?? Motorola 68030 with 32-bit addressing
Page Size Important HW design decision Several factors considered Internal fragmentation Smaller page size, less amount of internal fragmentation Smaller page size, more pages required per process More pages per process means larger page tables Page faults and Thrashing!
Example Page Sizes
Page Replacement Demand Paging Page faults Memory is full
Demand Paging Bring a page into memory only when it is needed. Less I/O needed Less memory needed Faster response More users Page is needed reference to it invalid reference abort not-in-memory bring to memory
Demand Paging Virtual Memory Larger than Physical Memory
Demand Paging Transfer of a Paged Memory to Contiguous Disk Space
Valid-Invalid Bit
Page Fault If there is ever a reference to a page, first reference will trap to OS page fault OS looks at an internal table (in PCB of process) to decide: Invalid reference abort. Just not in memory. Get empty frame. Swap page into frame. Reset tables, validation bit = 1. Restart instruction that was interrupted
No Free Frames? Page replacement – find some page in memory, but not really in use, swap it out. algorithm performance – want an algorithm which will result in minimum number of page faults. Same page may be brought into memory several times
Page Replacement Prevent over-allocation of memory by modifying page-fault service routine to include page replacement. Use modify (dirty) bit to reduce overhead of page transfers – only modified pages are written to disk. Page replacement completes separation between logical memory and physical memory – large virtual memory can be provided on a smaller physical memory.
Basic Page Replacement Find the location of the desired page on disk. Find a free frame: - If there is a free frame, use it. - If there is no free frame, use a page replacement algorithm to select a victim frame. Read the desired page into the (newly) free frame. Update the page and frame tables. Restart the process.
Basic Page Replacement
Page Replacement Algorithms Graph of Page Faults Versus The Number of Frames
Page Replacement Algorithms Want lowest page-fault rate. Evaluate algorithm by running it on a particular string of memory references (reference string) and computing the number of page faults on that string. In all our examples, the reference string is 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5.
FIFO Algorithm Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 3 Frames (3 pages can be in memory at a time per process) 4 Frames 1 1 4 5 2 2 1 3 9 page faults 3 3 2 4 1 1 5 4 2 2 1 5 10 page faults 3 3 2 4 4 3
FIFO Algorithm 15 page faults
FIFO: Belady’s Anomaly
Optimal Algorithm Lowest page-fault rate of all algorithms Never suffer from Belady’s anomaly Replace page that will not be used for longest period of time. 4 frames example 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 How do you know this? Used for measuring how well your algorithm performs. Difficult to implement as it requires prior knowledge of reference string (like SJF in CPU Scheduling) Mainly used for comparison studies 1 4 2 6 page faults 3 4 5
Optimal Page Replacement 09 page faults
LRU Algorithm Optimal algorithm not feasible LRU, an approx. to Optimal is feasible FIFO – looks back in time Optimal – looks forward in time Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 1 5 Replaces the page that has not been used for the longest period of time!!! 2 8 page faults 3 5 4 4 3 Scheme PF# FIFO 10 LRU 08 OPT 06 Recent past as an approx. of the near future!!!
LRU Page Replacement Scheme PF# FIFO 15 LRU 12 OPT 09 12 page faults
Some Interesting results Reference string S SR is the reverse string of S PFOPT (S) = PFOPT (SR) PFLRU (S) = PFLRU (SR)
LRU Algorithm Every process has a logical counter Counter Implementation Every process has a logical counter Counter is incremented every time a page is referenced Counter value is copied into PTE for that page “Time” of last reference of each page is recorded Search the PT for LRU page (with smallest time value) A write to memory for each memory access Any Problems? Counter Overflow
LRU Algorithm Stack implementation keep a stack of page numbers in a double link form: Page referenced: move it to the top requires 6 pointers to be changed (worst) No search for replacement
LRU: Stack Implementation
LRU Algorithm Considerable HW support is needed Beyond TLB registers Updating of the time fields or stack must be done for every memory reference Use interrupts for every memory reference? Few systems provide HW support for true LRU
Frame Locking Some of the frames in MM may be locked Locked frame can not be replaced Kernel is held in locked frames Key control structures I/O buffers and other time critical areas are also locked Lock bit is associated with each frame In frame table as well in the current page table
Replacement Algorithms FIFO circular buffer, round robin style (past) OPTIMAL time to next reference is longest (future) LRU page that has not been referenced for the longest time CLOCK
Swapper vs. Pager Demand paging is similar to paging system with swapping where processes reside on disk To execute, we swap in into memory Rather than swapping the entire process into memory, we use a LAZY SWAPPER Lazy swapper never swaps a page into memory unless that page will be needed Using swapper is incorrect with Paging Swapper manipulates entire process, whereas pager is concerned with individual pages With demand paging, we use the term PAGER rather than swapper.
Memory Protection Memory protection implemented by associating protection bit with each frame Valid-invalid bit attached to each entry in the page table: “valid” indicates that the associated page is in the process’ logical address space, and is thus a legal page “invalid” indicates that the page is not in the process’ logical address space
Valid (v) or Invalid (i) Bit In A Page Table
Comparison FIFO Simple to implement Performs relatively poorly LRU does nearly as well as optimal Difficult to implement Imposes significant overheads Solution? Approximate LRU algorithms! approx. the performance of LRU with less overheads Variants of scheme called CLOCK POLICY
Basic Replacement Algorithms Clock Policy Additional bit called a use bit When a page is first loaded in memory, the use bit is set to 1 When the page is referenced, the use bit is set to 1 When it is time to replace a page, the first frame encountered with the use bit set to 0 is replaced. During the search for replacement, each use bit set to 1 is changed to 0 Used in Multics OS
Dr. Navneet Goyal, BITS, Pilani
Dr. Navneet Goyal, BITS, Pilani
Example 2 3 2 1 5 2 4 5 3 2 5 2 3 frames
Second chance Algorithm The page replacement algorithm just described is called the second-chance algorithm WHY? Enhanced second-chance algorithm
Enhanced Second chance Algorithm Second-chance algorithm can be made more powerful by increasing the number of bits it employs If no. of bits employed is 0? In all processors that support paging, a modify or dirty bit is associated with every frame in memory
Enhanced Second chance Algorithm Each frame falls into one of the 4 categories: 0 0 1 0 (not modified) 0 1(modified) 1 1
Enhanced Second chance Algorithm Beginning at the current position of the pointer, scan the buffer. Make no changes to use bit. Frame first encountered with 0 0 is replaced If 1 fails, scan again, looking for frame with 0 1. Replace first such frame. During scan, set the use bit to 0 If 2 fails, pointer returns to the starting position, & all the frames will have use bit set to 0. Repeat step 1 and, if necessary, step 2 Used in Macintosh VM scheme