Background Information

Name: Background Information
Uploaded: 2017-11-05T16:21:43+00:00
Duration: PTM32S11
Channel: Marianna Bothell
Description: Background Information

Background Information
To execute Processes must be in main memory The CPU can only directly access main memory and registers Speed Register access requires a single CPU cycle Accessing main memory can take multiple cycles Accessing disk can take milliseconds Cache sits between main memory and CPU registers Memory mapping: always depends on hardware assists Depending on the Hardware, processes might Contiguous logical memory; contiguous in physical memory Contiguous logical memory; scattered through physical memory Memory protection: processes have a limited memory view

Memory Management Issues
Goal: Effective Allocation of memory among processes How and when are memory references bound to absolute physical addresses? How can processes maximize memory use? How many processes can be in memory? Can processes move during while they execute? Can programs exceed the size of physical memory? Do entire programs need to be in memory to run? Can memory be shared among processes? How are processes protected from each other? What are the system limitations? memory limits? CPU processing speed? Disk speed? Hardware assistance?

Logical vs. Physical Address Space
Definitions Memory Management Unit (MMU): Device mapping logical (virtual) addresses to physical addresses Logical address – process view of memory Physical address –MMU view of memory Memory references Logical and physical addresses are the same when binding occurs during compile or load time Logical and physical addresses are different when binding occurs dynamically during execution

When are Processes Bound to Memory
Compile time: Compiler generates absolute references Load time: Compiler generates relocatable code. The Link Editor merges separately compiled modules and the loader generates absolute code Execution time: Binding delayed until run time. Processes can move during execution. Hardware support required.

A Simple Memory Mapping Scheme
Controlled by a pair of base and limit registers define the logical address space The MMU adds the content of the relocation (base) register to each memory reference The limit register disallows reverences that are out of bounds

Hardware to Support Many Processes in Memory

MMU Relocation Register Protection
Program accesses a memory location Trap When accessing a location that is out of range Action: terminate the process

Improving Memory Utilization
Overlays Parts of a process load into an overlay area Implemented by user programs using an overlay aware loader Swapping with OS support Backing store: a fast disk partition large enough to accommodate direct access copies of all memory images Swap operation: Temporarily roll out lower priority process and roll in another process on the swap queue Issues: seek time and transfer time Modified versions of swapping are found on many systems (i.e., UNIX, Linux, and Windows) Overlays Swapping

Dynamic Library Loading
Definitions: Library functions: those which are common to many applications Dynamic loading: the process of loading library functions at run time Advantages Unused functions are never loaded Minimize memory use if large functions handle infrequent events Operating system support is not required. Disadvantage: Library functions are not shared among processes Could require application load requests

Dynamic Linking Assumption: A run-time (shared) library exists Stub
Set of functions shared by many processes Linked at execution time Stub A piece of code that locates the memory-resident library function The stub replaces itself and with the library function address and executes it Operating System Support Return address of function if in memory Load the function if it is not in memory

Contiguous Memory Allocation
Each Process is stored in one contiguous block Memory is partitioned into two areas The kernel and interrupt vector is usually in low memory User processes are in high memory Single-partition allocation MMU relocation base and limit registers enforce memory protection The size of the operating system doesn’t impact user programs Multiple-partition allocation Processes allocated into spare ‘Holes’ (available areas of memory) Operating system maintains allocated and free memory OS OS OS OS process 5 process 5 process 5 process 5 process 9 process 9 process 8 process 10 process 2 process 2 process 2 process 2

Algorithms for Contiguous Allocations
Issues: How to maintain the free list; what is the search algorithm complexity? Algorithms (Comment worst-fit generally performs worst) First-fit: Allocate the first hole that is big enough. Best-fit: Use smallest hole that is big enough; Leaves small leftover hole Worst-fit: Allocate the largest hole; Leaves large leftover holes Fragmentation External: memory holes limit possible allocations Internal: allocated memory is larger than needed 50% fragmentation rule: ½ of memory lost because of fragmentation Compaction Algorithm Shuffle memory contents to place all free memory together. Issues Memory binding must be dynamic Time consuming, handling physical I/O during the remapping

Paged Memory Addressing
The MMU causes every memory reference instruction address to contain a: Page number (p) – index into a page table array containing the base address of every frame in physical memory Page offset (d) – Offset into a physical frame Logical addresses contain m bits, n of which are a displacement. There are 2m pages of size 2n Advantage: No external fragmentation page number page offset p d m - n n

Paging Definition: A page is a fixed-sized block of logical memory, generally a power of 2 in length between 512 and 8,192 bytes Definition: A frame is a fixed-sized block of physical memory. Each frame corresponds to a single page Definition: A Page table is an array that translates from pages to frames Operating System responsibilities Maintain the page table Allocate sufficient pages from free frames to execute a program Benefit: Logical address space of a process can be noncontiguous and allocated as needed Issue: Internal fragmentation

Paged Memory Allocation
p indexes the page table referring to physical frames d is the offset into a physical frame Each process has an OS maintained page table Process page table Four locations per page Physical frames Note: Instruction address bits define bounds of the logical address space

Page Table Examples Before allocation After allocation Memory layout

Page Table Implementation
Hardware Assist Page-table base register (PTBR) addresses the page table Page-table length register (PRLR) page table size Issue: Every memory access requires two trips to memory which could slow the processor speed by half (1) read page table; access (2) memory reference Solution: A translation look-aside associative memory (TLBs), which we describe on the next slide

Translation look-aside buffers
Associative Memory (parallel search) to avoid double memory access Two column table Return frame If page found Otherwise use page table Timing: Assume: 20 ns TLB access, 100 ns main memory access, hit ratio 80% Expected access time (EAT): .8 * * = 140 ns Page Number Frame Number Note: The TLB is flushed on context switches

Extra Page Table Bits Valid-invalid bits
“valid”: page belongs to process; it is legal “invalid”: illegal page that is not accessible Expanded uses Virtual memory: page trap triggers a disk load Read only page Address-space identifier (ASID) to identify the process owning the page Note The entire last partial page is marked as valid Processes can access those locations incorrectly

Processes Sharing Data (or Not)
Shared One copy of read-only code shared among processors Mapped to same logical address of all processes Private Process keeps a separate copy code and data enable data to be anywhere in memory

Hierarchical Page Tables
Single level Hierarchical Two level Notes: Tree structure Multiple memory accesses required to find the actual physical locations Parts of the page table can be on disk Page Offset 20 12 Outer page Inner page Offset 10 12

Three-level Paging Scheme

Hashed Page Tables Hashing complexity is close to O(1)
Collisions resolved using separate chaining (linked list) Virtual page number hashed to physical frame Common on address spaces > 32 bits Ineffective if collisions are frequent

Inverted Page Table Goal: Reduce page table memory requirements
One global page table Advantage: Eliminates a page table per process Disadvantage: Slower memory access because of searching Implementation Hash with key = pid & page number TLB access eliminates search most of the time Example: UltraSPARC

Segmentation Supports a process view of memory
Program Segments: main, object, stack, symbol table, arrays Subroutines (1) Main Program (4) Library Methods (2) Stack (3) Symbol Table (5) Segment table registers Segment base register (SBR) = segment table’s location Segment length register (SLR) = # of segments in a program Segments Are variable size; allocated via first fit/best fit algorithms can be shared among processors and relocated at the segment level able to contain protection bits for: valid bit, read/write privileges Suffer from external fragmentation

Segmentation Examples
Hardware

Segmentation with Paging
Segment table entries address a segment page table Point to correct page table MULTICS Intel 386 The MULTICS system pages the segments.

Pentium Address Translation
Segmentation with Paging Supports both segmentation and segmentation with paging Translation Scheme Segmentation unit produces a linear address The paging unit produces the physical address Segmentation Only

Pentium Paging Architecture

Three-level Paging in Linux

Virtual Memory Separate logical and physical memory spaces Concepts
Programs access logical memory Operating system memory management and hardware coordinate to establish a logical to physical mapping Advantages The whole program doesn't need to be in memory The system can execute programs larger than memory Processes can share blocks of memory Resident library routines Improved memory utilization: more processes running concurrently Memory mapped files Copy on write algorithms Disadvantages Extra disk I/O and thrashing

Logical Memory Examples

Copy on Writes Processes initially share the same pages
Operating System Support Maintains a list of free zeroed out pages Each get their own copy only after the page modified Before Modification After Modification

The Lazy Swapper (pager)
Demand Paging Definition: Pages loaded into memory “on demand” The Lazy Swapper (pager) The lazy swapper load pages only when needed. This minimizes I/O and memory requirements and allows for more users

Page table entries contain valid bits and dirty bits
Hardware Support Frame # Valid Dirty 1 . Page table entries contain valid bits and dirty bits Valid-invalid bits - set 0 (invalid) when the page is not in memory. Dirty bits are set when a page gets modified. This avoids unnecessary writes during swaps. Advantages: less I/O, less memory, faster response, more users

Page Faults Note: Some pages can be loaded and swapped out multiple times Note: Unused bits for invalid entries contain the page’s disk address

Processing Page Faults
User program references a location not resident Page fault occurs and OS handles the fault IF invalid reference, abort program ELSE IF no empty frame Choose victim and write to backing store ELSE Choose empty frame Find needed page on disk Read page into frame Update page tables Set page table valid bit Re-execute the instruction causing the page fault

Performance of Demand Paging
Page Fault Rate 0  p  1.0 p=0 means no page faults p=1 means every reference triggers a page fault Effective Access Time (EAT) EAT=(1–p) x memory access + p* (page fault overhead + swap page out + swap page in + restart overhead) Example p = 0.01 Memory access time = 200 nanoseconds Average page-fault service time = 8 milliseconds Restart overhead is insignificant EAT = (1 – p) x p x 8,000,000 = ≈ 80 us Question: Is the flexibility worth the extra overhead?

Page Replacement Algorithms
Occurs when all of the frames are occupied Swap victim out and bring page in Technique: Assign a number of frames to each process (x axis) Goal: minimize page faults (y axis) Algorithm Evaluation: Count faults using a predefined access string Belady’s Anomaly: When allocating more frames causes more faults Copy out: Only write frames to backing store that are “dirty”

First-In-First-Out (FIFO) Algorithm
Memory Reference String: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 Illustration of Belady’s Anomoly Case 1: A process can have 3 frames at a time. Case 2: A process can have 4 frames at a time. 9 page faults 1 2 3 4 5 Another reference string example 1 5 4 2 1 5 10 page faults 3 2 4 3

Optimal Page Replacement
Reference String: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 Replace the page that will not be used for the longest period of time A process can have 4 frames at a time Advantage: It is optimal Disadvantage: We don't know the future Use: A good benchmark algorithm 1 4 2 6 page faults 6 page faults 3 4 5 Another reference string example

Replace the page that has not been used for the longest period of time
Reference String: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 Replace the page that has not been used for the longest period of time LRU Page Replacement 1 2 3 5 4 8 Page faults Another reference string example Assumption: A process can have four frames in memory at a time

Naïve Stack Implementation
O(1) victim frame selection Search and update on each memory reference

Approximate LRU with Hardware Support
Reference bit Each page has a reference bit, initially = 0 Hardware sets value = 1 when page is referenced OS replace the first page with a 0 bit Second chance Algorithm Need second bit. Clock loop through pages. Replace page where reference=0 twice in a row. set reference bit 0. leave page in memory. replace next page (in clock order), subject to same rules.

Frame Allocation How are frames allocated among executing processes?
Allocation can be Global or Local Global: selects a replacement frame from a single set all frames Local: Each process selects from its own set of allocated frames Each process needs minimum number of pages Ex: IBM 370 – A MOVE instruction could require 6 pages: Instruction is 6 bytes long and could span 2 pages. 2 pages for the from address, 2 pages for the to address Each process has a need less than a maximum number of pages Excessive allocation to a process can degrade system performance Examples of frame allocation algorithms fixed: Each process gets an equal number of frames priority: Higher priority processes get more frames Proportional: Size of process relative to other processes

Other Replacement Algorithms
Lease Frequently Used (LFU) Replaces the page with the lowest usage count. In case of a tie, the oldest page in memory is replaced. Disadvantage: A page used with heavy usage remains in memory after it is no longer needed. Most Frequently Used (MFU) Replace the page with the largest usage count. In case of a tie, replace the oldest page in memory. Idea: page with smallest count was just loaded Usage counts: updated at regular intervals using a page table entry’s reference bit

Thrashing Considerations
Thrashing: Excessive system resources dedicated to swapping pages Insufficient frames leads to low CPU utilization. Small length of ready queue. Added more processes leads to more thrashing Paging works because of locality Processes perform most of their work referencing narrow ranges of memory Thrashing occurs when the total size of process locality > total memory size Performance log – memory access over time

Working Set Model Goal: achieve an “acceptable” page-fault rate
Adjust allocated frames to references done in a window of time Goal: achieve an “acceptable” page-fault rate If actual rate too low, a process loses frame If actual rate too high, a process gains frame

Working-Set Model   working-set window  a fixed number of page references Example: 10,000 instruction WSSi (working set of Process Pi) = total number of pages referenced in the most recent  (varies in time) if  too small will not encompass entire locality if  too large will encompass several localities if  =   will encompass entire program D = Total WSSi  total demand frames if D > memory pages  Thrashing Policy if D > memory pages, then suspend one of the processes

Working-Set Model Working set: The pages referenced during a working set window Working-set window (): A fixed number of instruction references (ex: 10,000) Processes are given the frames in their working set Considerations Small : Processes lose frames. Large : Processes gain frames.  =  includes the entire program. Thrashing results if the sum of all working sets (D) exceed memory (m) Implementation Suspend processes if D > memory pages Timer interrupts every /2 time units. Referenced pages are included in the working set; others are discarded. Reference bits are reset

Pre-paging Reduced page costs: s * α
Purpose: Reduce the page occurring at process startup Pre-page all or some of the pages before they are referenced Note: If pre-paged pages are unused, wasted I/O and memory Assume s pages are prepaged and α of the pages are used Reduced page costs: s * α Unnecessary page loads: s * (1- α) IF α is near zero  pre-paging loses

Additional Considerations
I/O Interlock – Pages involved in data transfer must be locked into memory TLB Size impacts working set size TLB Reach = (TLB Size) X (Page Size) If the working set is in the TLB, there will be less page faults Techniques to reduce page faults Increase the Page Size: leads to increased fragmentation Provide Variable Page Sizes based on application specifications Poor program design can increase page faults Example: One page for each row 1024x1024 array Program 1 (1024x1024 page faults) // Index by columns for (j = 0; j < A.length; j++) for (i = 0; i < A.length; i++) A[i,j] = 0; Program 2 (1024 page faults) // Index by rows for (i = 0; i < A.length; i++) for (j = 0; j < A.length; j++) A[i,j] = 0;

Memory Mapped Files Disk blocks are mapped to memory pages.
We read a page-sized portion of files into physical pages Reads/writes to/from files use simple memory access Access without read() and write() system calls Shared memory connects mapped memory to several processes

Memory-Mapped Files in Java

Memory-Mapped Shared Memory in Windows

Examples Windows NT Solaris 2
Demand paging with clustering. Clustering loads surrounding pages Process parameters: working set minimum and working set maximum. Automatic working set trimming occurs if free memory is too low Solaris 2 Maintains list of free pages The pageout function selects victims using LRU scans. It runs more frequently if free memory is low Lotsfree controls if paging starts Scanrate controls the page scan rate, varying from slowscan to fastscan Solaris 2

Allocating Kernel Memory
Differently from user allocation Deals with physical memory Kernel requests memory for structures of varying sizes Some kernel memory must be contiguous Approaches Buddy System Allocation Slab Memory Allocation

Buddy System Allocation
Allocates from a fixed-size segment of contiguous pages Memory is allocated in power-of-2 blocks Allocation requests round up to the next power of 2 If the kernel needs a smaller allocation than the blocks that are available, then repeatedly split a larger block into two buddies of next-lower power of 2 until the correct sized block is found. Search time = O(lg tree depth)

Slab Memory Allocation
Slab: One or more physically contiguous pages Slab cache Contains of one or more slabs A cache exists for each unique kernel data structure Single cache for each unique kernel data structure A cache initially contains a group instantiated data structure objects The cache is initialized with objects marked as free Allocated objects are marked as used A new Slab is added to a cache when no more free objects Benefits No fragmentation Fast allocation

Slab Allocation

Background Information

Similar presentations

Presentation on theme: "Background Information"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Background Information

Similar presentations

Presentation on theme: "Background Information"— Presentation transcript:

Similar presentations

About project

Feedback