Operating Systems Prof. Navneet Goyal

Operating Systems Prof. Navneet Goyal
Department of Computer Science & Information Systems BITS, Pilani

Topics for Today Concept of Paging Logical vs. Physical Addresses
Address Translation Page Tables Page Table Implementation Hierarchical Paging Hashed Page Tables Inverted Page Tables Translation Look aside Buffers (TLBs) Associative Memory ASID

Paging Memory is divided into fixed size chunks; FRAMES
Process is divided into fixed size chunks;PAGES A frame can hold one page of data Physical address space of a process need not be contiguous Very limited internal fragmentation No external fragmentation

Paging 1 2 3 4 5 6 7 8 9 10 11 12 13 14 A0 A1 A2 A3 A0 A1 A2 A3 B0 B1 B2

Paging 1 2 3 4 5 6 7 8 9 10 11 12 13 14 A0 A1 A2 A3 B0 B1 B2 C0 C1 C2 C3 A0 A1 A2 A3 C0 C1 C2 C3 A0 A1 A2 A3 D0 D1 D2 C0 C1 C2 C3 D3 D4

Page Table Non-contiguous allocation of memory for D
Will base address register suffice? Page Table Within program Each logical address consists of a page number and an offset within the page Logical to Physical Address translation is done by processor hardware (Page no, offset) (frame no, offset)

Address Translation Scheme
Address generated by CPU is divided into: Page number (p) – used as an index into a page table which contains base address of each page in physical memory. Page offset (d) – combined with base address to define the physical memory address that is sent to the memory unit.

Address Translation Architecture

Page Tables

Paging Example Page 0 Page 2 Page 1 Page 3 1 2 3 4 5 6 7 Page 0 Page 1
1 2 3 4 5 6 7 Page 0 Page 1 Page 2 Page 3 1 2 3 1 4 3 7 Logical Memory Page table Physical Memory

32-byte Memory with 4 byte pages
Paging Example 32-byte Memory with 4 byte pages

Free Frames Before allocation After allocation

Page Table Implementation
Small PTs (up to 256 entries) Can be stored in Registers Example DEC PDP-11 (16 bit address & 8KB page size) Big PTs (1M entries) are stored in MM Page Table Base Register (PRTR) points to PT 2 memory access problem Hardware Solution – Translation Look-Aside Buffers (TLBs)

Page Table Structure Modern systems have large logical address space (232 – 264) Page table becomes very large One page table per process 232 logical-address space & page-size 4KB Page table consists of 1 mega entries Each entry 4 bytes Memory requirements of page table = 4MB

Page table is kept in main memory Page-table base register (PTBR) points to the page table Page-table length register (PRLR) indicates size of the page table In this scheme every data/instruction access requires two memory accesses. One for the page table and one for the data/instruction

Each virtual memory reference can cause two physical memory accesses one to fetch the page table one to fetch the data To overcome this problem a high-speed cache is set up for page table entries called the TLB - Translation Lookaside Buffer

TLB Fast-lookup hardware cache called associative memory or translation lookaside buffers (TLBs) Some TLBs store address-space identifiers (ASIDs) in each TLB entry – uniquely identifies each process to provide address- space protection for that process If ASIDs not supported by TLB?

TLB & ASID Address-space protection
While resolving virtual page no., it ensures that the ASID for the currently running process matches the ASID associated with the virtual page What if ASID does not match? Attempt is treated as a TLB miss

TLB & ASID ASID allows entries for different processes to exist in TLB
If ASID not supported? With each context switch, TLB must be flushed

TLB Contains page table entries that have been most recently used
Functions same way as a memory cache TLB is a CPU cache used by memory management hardware to speed up virtual to logical address translation In TLB, virtual address is the search key and the search result is a physical address Typically a content addressable memory

Associative Memory/Mapping
Also referred to as Content-addressable memory (CAM) Special type of memory used in special type of high speed searching applications Content-addressable memory is often used in computer networking devices. For example, when a network switch receives a Data Frame from one of its ports, it updates an internal table with the frame's source MAC address and the port it was received on. It then looks up the destination MAC address in the table to determine what port the frame needs to be forwarded to, and sends it out that port. The MAC address table is usually implemented with a binary CAM so the destination port can be found very quickly, reducing the switch's latency Processor is equipped with HW that allows it to interrogate simultaneously a number of TLB entries to find if a search key exists

Associative Memory/Mapping
Unlike standard computer memory (RAM) in which the user supplies a memory address and the RAM returns the data word stored at that address CAM is designed such that the user supplies a data word and the CAM searches its entire memory to see if that data word is stored anywhere in it If the data word is found, the CAM returns a list of one or more storage addresses where the word was found (and in some architectures, it also returns the data word, or other associated pieces of data) Thus, a CAM is the hardware embodiment of what in software terms would be called an associative array

TLB Associative High-speed memory Between 64-1024 entries
TLB Hit & TLB Miss Page # Frame # Address translation (A´, A´´) -If A´ is in associative register, get frame # out. -Otherwise get frame # from page table in memory

TLB Perform page replacement

TLB TLB Miss – Reference PT Add page # and frame # to TLB
If TLB full – select one for replacement Least Recently Used (LRU) or random Wired down entries TLB entries for kernel code are wired down

Effective Access Time Hit-Ratio Search time for TLB = 20 ns
Time for memory access = 100 ns Total Time (in case of hit) = 120 ns In case of miss, Total time= =220 Effective memory access time (80% hit ratio) = .8* *220 = 140 ns 40% slowdown in memory access time For 98% hit-ratio – slowdown = 22%

Effective Access Time Associative Lookup =  time unit
Assume memory cycle time is 1 microsecond Hit ratio – percentage of times that a page number is found in the associative registers Hit ratio =  Effective Access Time (EAT) EAT = (1 + )  + (2 + )(1 – ) = 2 +  – 

Page Table Structure Hierarchical Paging Hashed Page Tables
Inverted Page Tables

Hierarchical Paging 32-bit address line with 4Kb page size
Logical address 20-bit page no bit page offset Divide the page table into smaller pieces Page no. is divided into 10-bit page no. & 10- bit page offset Page table is itself “paged” “Forward-Mapped” page table Used in x86 processor family page number page offset pi p2 d

Two-Level Page-Table Scheme

Two-Level Scheme for 32-bit Address

VAX Architecture for Paging
Variation of 2-level paging 32-bit addressing with 512 bytes page LAS is divided into 4 Sections of 230 bytes each 2 high order bits for segment Next 21 bits for page # of that section 9 bits for offset in the desired page

Address-Translation Scheme
Address-translation scheme for a two-level 32-bit paging architecture

Limitations of 2-Level paging
Consider 64-bit LAS with page size 4KB Entries in PT = 252 2-level paging (42,10,12) Outer PT contains 242 entries 3-level paging (32,10,10,12) Outer PT still contains 232 entries SPARC with 32-bit addressing 4-level Paging?? Motorola with 32-bit addressing

Page Size Important HW design decision Several factors considered
Internal fragmentation Smaller page size, less amount of internal fragmentation Smaller page size, more pages required per process More pages per process means larger page tables Page faults and Thrashing!

Example Page Sizes

Page Replacement Demand Paging Page faults Memory is full

Demand Paging Bring a page into memory only when it is needed.
Less I/O needed Less memory needed Faster response More users Page is needed  reference to it invalid reference  abort not-in-memory  bring to memory

Demand Paging Virtual Memory Larger than Physical Memory

Demand Paging Transfer of a Paged Memory to Contiguous Disk Space

Valid-Invalid Bit

Page Fault If there is ever a reference to a page, first reference will trap to OS  page fault OS looks at an internal table (in PCB of process) to decide: Invalid reference  abort. Just not in memory. Get empty frame. Swap page into frame. Reset tables, validation bit = 1. Restart instruction that was interrupted

No Free Frames? Page replacement – find some page in memory, but not really in use, swap it out. algorithm performance – want an algorithm which will result in minimum number of page faults. Same page may be brought into memory several times

Page Replacement Prevent over-allocation of memory by modifying page-fault service routine to include page replacement. Use modify (dirty) bit to reduce overhead of page transfers – only modified pages are written to disk. Page replacement completes separation between logical memory and physical memory – large virtual memory can be provided on a smaller physical memory.

Basic Page Replacement
Find the location of the desired page on disk. Find a free frame: - If there is a free frame, use it. - If there is no free frame, use a page replacement algorithm to select a victim frame. Read the desired page into the (newly) free frame. Update the page and frame tables. Restart the process.

Basic Page Replacement

Page Replacement Algorithms
Graph of Page Faults Versus The Number of Frames

Page Replacement Algorithms
Want lowest page-fault rate. Evaluate algorithm by running it on a particular string of memory references (reference string) and computing the number of page faults on that string. In all our examples, the reference string is 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5.

FIFO Algorithm Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5
3 Frames (3 pages can be in memory at a time per process) 4 Frames 1 1 4 5 2 2 1 3 9 page faults 3 3 2 4 1 1 5 4 2 2 1 5 10 page faults 3 3 2 4 4 3

FIFO Algorithm 15 page faults

FIFO: Belady’s Anomaly

Optimal Algorithm Lowest page-fault rate of all algorithms
Never suffer from Belady’s anomaly Replace page that will not be used for longest period of time. 4 frames example 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 How do you know this? Used for measuring how well your algorithm performs. Difficult to implement as it requires prior knowledge of reference string (like SJF in CPU Scheduling) Mainly used for comparison studies 1 4 2 6 page faults 3 4 5

Optimal Page Replacement
09 page faults

LRU Algorithm Optimal algorithm not feasible
LRU, an approx. to Optimal is feasible FIFO – looks back in time Optimal – looks forward in time Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 1 5 Replaces the page that has not been used for the longest period of time!!! 2 8 page faults 3 5 4 4 3 Scheme PF# FIFO 10 LRU 08 OPT 06 Recent past as an approx. of the near future!!!

LRU Page Replacement Scheme PF# FIFO 15 LRU 12 OPT 09 12 page faults

Some Interesting results
Reference string S SR is the reverse string of S PFOPT (S) = PFOPT (SR) PFLRU (S) = PFLRU (SR)

LRU Algorithm Every process has a logical counter
Counter Implementation Every process has a logical counter Counter is incremented every time a page is referenced Counter value is copied into PTE for that page “Time” of last reference of each page is recorded Search the PT for LRU page (with smallest time value) A write to memory for each memory access Any Problems? Counter Overflow

LRU Algorithm Stack implementation
keep a stack of page numbers in a double link form: Page referenced: move it to the top requires 6 pointers to be changed (worst) No search for replacement

LRU: Stack Implementation

LRU Algorithm Considerable HW support is needed Beyond TLB registers
Updating of the time fields or stack must be done for every memory reference Use interrupts for every memory reference? Few systems provide HW support for true LRU

Frame Locking Some of the frames in MM may be locked
Locked frame can not be replaced Kernel is held in locked frames Key control structures I/O buffers and other time critical areas are also locked Lock bit is associated with each frame In frame table as well in the current page table

Replacement Algorithms
FIFO circular buffer, round robin style (past) OPTIMAL time to next reference is longest (future) LRU page that has not been referenced for the longest time CLOCK

Swapper vs. Pager Demand paging is similar to paging system with swapping where processes reside on disk To execute, we swap in into memory Rather than swapping the entire process into memory, we use a LAZY SWAPPER Lazy swapper never swaps a page into memory unless that page will be needed Using swapper is incorrect with Paging Swapper manipulates entire process, whereas pager is concerned with individual pages With demand paging, we use the term PAGER rather than swapper.

Memory Protection Memory protection implemented by associating protection bit with each frame Valid-invalid bit attached to each entry in the page table: “valid” indicates that the associated page is in the process’ logical address space, and is thus a legal page “invalid” indicates that the page is not in the process’ logical address space

Valid (v) or Invalid (i) Bit In A Page Table

Comparison FIFO Simple to implement Performs relatively poorly LRU
does nearly as well as optimal Difficult to implement Imposes significant overheads Solution? Approximate LRU algorithms! approx. the performance of LRU with less overheads Variants of scheme called CLOCK POLICY

Basic Replacement Algorithms
Clock Policy Additional bit called a use bit When a page is first loaded in memory, the use bit is set to 1 When the page is referenced, the use bit is set to 1 When it is time to replace a page, the first frame encountered with the use bit set to 0 is replaced. During the search for replacement, each use bit set to 1 is changed to 0 Used in Multics OS

Dr. Navneet Goyal, BITS, Pilani

Example 3 frames

Second chance Algorithm
The page replacement algorithm just described is called the second-chance algorithm WHY? Enhanced second-chance algorithm

Enhanced Second chance Algorithm
Second-chance algorithm can be made more powerful by increasing the number of bits it employs If no. of bits employed is 0? In all processors that support paging, a modify or dirty bit is associated with every frame in memory

Each frame falls into one of the 4 categories: 0 0 1 0 (not modified) 0 1(modified) 1 1

Beginning at the current position of the pointer, scan the buffer. Make no changes to use bit. Frame first encountered with 0 0 is replaced If 1 fails, scan again, looking for frame with 0 1. Replace first such frame. During scan, set the use bit to 0 If 2 fails, pointer returns to the starting position, & all the frames will have use bit set to 0. Repeat step 1 and, if necessary, step 2 Used in Macintosh VM scheme

Operating Systems Prof. Navneet Goyal

Similar presentations

Presentation on theme: "Operating Systems Prof. Navneet Goyal"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Operating Systems Prof. Navneet Goyal

Similar presentations

Presentation on theme: "Operating Systems Prof. Navneet Goyal"— Presentation transcript:

Similar presentations

About project

Feedback