Presentation is loading. Please wait.

Presentation is loading. Please wait.

Background Information To execute –Processes must be in main memory –The CPU can only directly access main memory and registers Speed –Register access.

Similar presentations


Presentation on theme: "Background Information To execute –Processes must be in main memory –The CPU can only directly access main memory and registers Speed –Register access."— Presentation transcript:

1 Background Information To execute –Processes must be in main memory –The CPU can only directly access main memory and registers Speed –Register access requires a single CPU cycle –Accessing main memory can take multiple cycles –Accessing disk can take milliseconds –Cache sits between main memory and CPU registers Memory mapping: always depends on hardware assists Depending on the Hardware, processes might –Contiguous logical memory; contiguous in physical memory –Contiguous logical memory; scattered through physical memory Memory protection: processes have a limited memory view

2 Memory Management Issues 1.How and when are memory references bound to absolute physical addresses? 2.How can processes maximize memory use? How many processes can be in memory? Can processes move during while they execute? Can programs exceed the size of physical memory? Do entire programs need to be in memory to run? Can memory be shared among processes? 3.How are processes protected from each other? 4.What are the system limitations? memory limits? CPU processing speed? Disk speed? Hardware assistance? Goal: Effective Allocation of memory among processes

3 Logical vs. Physical Address Space Definitions –Memory Management Unit (MMU): Device mapping logical (virtual) addresses to physical addresses –Logical address – process view of memory –Physical address –MMU view of memory Memory references –Logical and physical addresses are the same when binding occurs during compile or load time –Logical and physical addresses are different when binding occurs dynamically during execution

4 When are Processes Bound to Memory Compile time: Compiler generates absolute references Load time: Compiler generates relocatable code. The Link Editor merges separately compiled modules and the loader generates absolute code Execution time: Binding delayed until run time. Processes can move during execution. Hardware support required.

5 A Simple Memory Mapping Scheme Controlled by a pair of base and limit registers define the logical address space The MMU adds the content of the relocation (base) register to each memory reference The limit register disallows reverences that are out of bounds

6 Hardware to Support Many Processes in Memory

7 MMU Relocation Register Protection Program accesses a memory location Trap –When accessing a location that is out of range –Action: terminate the process

8 Improving Memory Utilization Overlays Parts of a process load into an overlay area Implemented by user programs using an overlay aware loader Swapping with OS support Backing store: a fast disk partition large enough to accommodate direct access copies of all memory images Swap operation: Temporarily roll out lower priority process and roll in another process on the swap queue Issues: seek time and transfer time Modified versions of swapping are found on many systems (i.e., UNIX, Linux, and Windows) Swapping Overlays

9 Dynamic Library Loading Definitions: –Library functions: those which are common to many applications –Dynamic loading: the process of loading library functions at run time Advantages –Unused functions are never loaded –Minimize memory use if large functions handle infrequent events –Operating system support is not required. Disadvantage: –Library functions are not shared among processes –Could require application load requests

10 Dynamic Linking Assumption: A run-time (shared) library exists –Set of functions shared by many processes –Linked at execution time Stub –A piece of code that locates the memory-resident library function –The stub replaces itself and with the library function address and executes it Operating System Support –Return address of function if in memory –Load the function if it is not in memory

11 Contiguous Memory Allocation Memory is partitioned into two areas –The kernel and interrupt vector is usually in low memory –User processes are in high memory Single-partition allocation –MMU relocation base and limit registers enforce memory protection –The size of the operating system doesn’t impact user programs Multiple-partition allocation –Processes allocated into spare ‘Holes’ (available areas of memory) –Operating system maintains allocated and free memory OS process 5 process 8 process 2 OS process 5 process 2 OS process 5 process 2 OS process 5 process 9 process 2 process 9 process 10 Each Process is stored in one contiguous block

12 Algorithms for Contiguous Allocations Issues: How to maintain the free list; what is the search algorithm complexity? Algorithms (Comment worst-fit generally performs worst) –First-fit: Allocate the first hole that is big enough. –Best-fit: Use smallest hole that is big enough; Leaves small leftover hole –Worst-fit: Allocate the largest hole; Leaves large leftover holes Fragmentation –External: memory holes limit possible allocations –Internal: allocated memory is larger than needed –50% fragmentation rule: ½ of memory lost because of fragmentation Compaction Algorithm –Shuffle memory contents to place all free memory together. –Issues Memory binding must be dynamic Time consuming, handling physical I/O during the remapping

13 Paged Memory Addressing The MMU causes every memory reference instruction address to contain a: –Page number (p) – index into a page table array containing the base address of every frame in physical memory –Page offset (d) – Offset into a physical frame –Logical addresses contain m bits, n of which are a displacement. There are 2 m pages of size 2 n –Advantage: No external fragmentation page numberpage offset p d m - nn

14 Paging Operating System responsibilities –Maintain the page table –Allocate sufficient pages from free frames to execute a program Benefit: Logical address space of a process can be noncontiguous and allocated as needed Issue: Internal fragmentation Definition: A page is a fixed-sized block of logical memory, generally a power of 2 in length between 512 and 8,192 bytes Definition: A frame is a fixed-sized block of physical memory. Each frame corresponds to a single page Definition: A Page table is an array that translates from pages to frames

15 Paged Memory Allocation 1.p indexes the page table referring to physical frames 2.d is the offset into a physical frame 3.Each process has an OS maintained page table Four locations per page Process page table Physical frames Note: Instruction address bits define bounds of the logical address space

16 Page Table Examples Before allocation After allocation Memory layout

17 Page Table Implementation Hardware Assist –Page-table base register (PTBR) addresses the page table –Page-table length register (PRLR) page table size Issue: –Every memory access requires two trips to memory which could slow the processor speed by half –(1) read page table; access (2) memory reference Solution: A translation look-aside associative memory (TLBs), which we describe on the next slide

18 Translation look-aside buffers Two column table –Return frame If page found –Otherwise use page table Timing: –Assume: 20 ns TLB access, 100 ns main memory access, hit ratio 80% –Expected access time (EAT):.8 * * = 140 ns Associative Memory (parallel search) to avoid double memory access Page NumberFrame Number Note: The TLB is flushed on context switches

19 Extra Page Table Bits Valid-invalid bits –“valid”: page belongs to process; it is legal –“invalid”: illegal page that is not accessible Expanded uses –Virtual memory: page trap triggers a disk load –Read only page –Address-space identifier (ASID) to identify the process owning the page Note –The entire last partial page is marked as valid –Processes can access those locations incorrectly

20 Processes Sharing Data (or Not) Shared –One copy of read-only code shared among processors –Mapped to same logical address of all processes Private –Process keeps a separate copy –code and data enable data to be anywhere in memory

21 Hierarchical Page Tables Single level Hierarchical Two level Notes: –Tree structure –Multiple memory accesses required to find the actual physical locations –Parts of the page table can be on disk PageOffset 2012 Outer page Inner page Offset 10 12

22 Three-level Paging Scheme

23 Hashed Page Tables Collisions resolved using separate chaining (linked list) Virtual page number hashed to physical frame Common on address spaces > 32 bits Ineffective if collisions are frequent Hashing complexity is close to O(1)

24 Inverted Page Table One global page table –Advantage: Eliminates a page table per process –Disadvantage: Slower memory access because of searching Implementation –Hash with key = pid & page number –TLB access eliminates search most of the time Example: UltraSPARC Goal: Reduce page table memory requirements

25 Segmentation Segment table registers –Segment base register (SBR) = segment table’s location –Segment length register (SLR) = # of segments in a program Segments –Are variable size; allocated via first fit/best fit algorithms –can be shared among processors and relocated at the segment level –able to contain protection bits for: valid bit, read/write privileges –Suffer from external fragmentation Supports a process view of memory Program Segments: main, object, stack, symbol table, arrays Subroutines (1) Main Program (4) Library Methods (2) Stack (3) Symbol Table (5)

26 Segmentation Examples Hardware

27 Segmentation with Paging The MULTICS system pages the segments. Segment table entries address a segment page table MULTICSIntel 386 Point to correct page table

28 Pentium Address Translation Supports both segmentation and segmentation with paging Translation Scheme –Segmentation unit produces a linear address –The paging unit produces the physical address Segmentation Only Segmentation with Paging

29 Pentium Paging Architecture

30 Three-level Paging in Linux

31 Virtual Memory Concepts –Programs access logical memory –Operating system memory management and hardware coordinate to establish a logical to physical mapping Advantages –The whole program doesn't need to be in memory –The system can execute programs larger than memory –Processes can share blocks of memory –Resident library routines –Improved memory utilization: more processes running concurrently –Memory mapped files –Copy on write algorithms Disadvantages –Extra disk I/O and thrashing Separate logical and physical memory spaces

32 Logical Memory Examples

33 Copy on Writes Processes initially share the same pages Operating System Support –Maintains a list of free zeroed out pages –Each get their own copy only after the page modified Before Modification After Modification

34 Demand Paging The Lazy Swapper (pager) The lazy swapper load pages only when needed. This minimizes I/O and memory requirements and allows for more users Definition: Pages loaded into memory “on demand”

35 Hardware Support Page table entries contain valid bits and dirty bits Valid-invalid bits - set 0 (invalid) when the page is not in memory. Dirty bits are set when a page gets modified. This avoids unnecessary writes during swaps. Frame #ValidDirty Advantages: less I/O, less memory, faster response, more users

36 Page Faults Note: Some pages can be loaded and swapped out multiple times Note: Unused bits for invalid entries contain the page’s disk address

37 Processing Page Faults User program references a location not resident Page fault occurs and OS handles the fault IF invalid reference, abort program ELSE IF no empty frame Choose victim and write to backing store ELSE Choose empty frame Find needed page on disk Read page into frame Update page tables Set page table valid bit Re-execute the instruction causing the page fault

38 Performance of Demand Paging Page Fault Rate 0  p  1.0 –p=0 means no page faults –p=1 means every reference triggers a page fault Effective Access Time (EAT) –EAT=(1–p) x memory access + p* (page fault overhead + swap page out + swap page in + restart overhead) Example –p = 0.01 –Memory access time = 200 nanoseconds –Average page-fault service time = 8 milliseconds –Restart overhead is insignificant –EAT = (1 – p) x p x 8,000,000 = ≈ 80 us Question: Is the flexibility worth the extra overhead?

39 Page Replacement Algorithms Technique: Assign a number of frames to each process (x axis) Goal: minimize page faults (y axis) Algorithm Evaluation: Count faults using a predefined access string Belady’s Anomaly: When allocating more frames causes more faults Copy out: Only write frames to backing store that are “dirty” Occurs when all of the frames are occupied Swap victim out and bring page in

40 First-In-First-Out (FIFO) Algorithm Case 1: A process can have 3 frames at a time. Case 2: A process can have 4 frames at a time page faults 4 3 Memory Reference String: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 9 page faults Another reference string example Illustration of Belady’s Anomoly

41 Optimal Page Replacement A process can have 4 frames at a time Advantage: It is optimal Disadvantage: We don't know the future Use: A good benchmark algorithm page faults 4 5 Reference String: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 Replace the page that will not be used for the longest period of time 6 page faults Another reference string example

42 LRU Page Replacement Assumption: A process can have four frames in memory at a time Reference String: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 Replace the page that has not been used for the longest period of time Another reference string example Page faults

43 Naïve Stack Implementation O(1) victim frame selection Search and update on each memory reference

44 Approximate LRU with Hardware Support Reference bit –Each page has a reference bit, initially = 0 –Hardware sets value = 1 when page is referenced –OS replace the first page with a 0 bit Second chance Algorithm –Need second bit. –Clock loop through pages. –Replace page where reference=0 twice in a row. set reference bit 0. leave page in memory. replace next page (in clock order), subject to same rules.

45 Frame Allocation Allocation can be Global or Local –Global: selects a replacement frame from a single set all frames –Local: Each process selects from its own set of allocated frames Each process needs minimum number of pages –Ex: IBM 370 – A MOVE instruction could require 6 pages: –Instruction is 6 bytes long and could span 2 pages. –2 pages for the from address, 2 pages for the to address Each process has a need less than a maximum number of pages –Excessive allocation to a process can degrade system performance Examples of frame allocation algorithms –fixed: Each process gets an equal number of frames –priority: Higher priority processes get more frames –Proportional: Size of process relative to other processes How are frames allocated among executing processes?

46 Other Replacement Algorithms Lease Frequently Used (LFU) –Replaces the page with the lowest usage count. In case of a tie, the oldest page in memory is replaced. –Disadvantage: A page used with heavy usage remains in memory after it is no longer needed. Most Frequently Used (MFU) –Replace the page with the largest usage count. In case of a tie, replace the oldest page in memory. –Idea: page with smallest count was just loaded Usage counts: updated at regular intervals using a page table entry’s reference bit

47 Thrashing Considerations Insufficient frames leads to –low CPU utilization. –Small length of ready queue. –Added more processes leads to more thrashing Paging works because of locality –Processes perform most of their work referencing narrow ranges of memory Thrashing occurs when the total size of process locality > total memory size Thrashing: Excessive system resources dedicated to swapping pages Performance log – memory access over time

48 Working Set Model Goal: achieve an “acceptable” page-fault rate –If actual rate too low, a process loses frame –If actual rate too high, a process gains frame Adjust allocated frames to references done in a window of time

49 Working-Set Model   working-set window  a fixed number of page references Example: 10,000 instruction WSS i (working set of Process P i ) = total number of pages referenced in the most recent  (varies in time) –if  too small will not encompass entire locality –if  too large will encompass several localities –if  =   will encompass entire program D = Total WSS i  total demand frames if D > memory pages  Thrashing Policy if D > memory pages, then suspend one of the processes

50 Working-Set Model Considerations –Small  : Processes lose frames. –Large  : Processes gain frames.  =  includes the entire program. –Thrashing results if the sum of all working sets (D) exceed memory (m) Implementation –Suspend processes if D > memory pages –Timer interrupts every  /2 time units. Referenced pages are included in the working set; others are discarded. Reference bits are reset Working set: The pages referenced during a working set window Working-set window (  ): A fixed number of instruction references (ex: 10,000) Processes are given the frames in their working set

51 Pre-paging Purpose: Reduce the page occurring at process startup Pre-page all or some of the pages before they are referenced Note: If pre-paged pages are unused, wasted I/O and memory Assume s pages are prepaged and α of the pages are used –Reduced page costs: s * α –Unnecessary page loads: s * (1- α) –IF α is near zero  pre-paging loses

52 Additional Considerations I/O Interlock – Pages involved in data transfer must be locked into memory TLB Size impacts working set size –TLB Reach = (TLB Size) X (Page Size) –If the working set is in the TLB, there will be less page faults Techniques to reduce page faults –Increase the Page Size: leads to increased fragmentation –Provide Variable Page Sizes based on application specifications Poor program design can increase page faults –Example: One page for each row 1024x1024 array –Program 1 (1024x1024 page faults) // Index by columns for (j = 0; j < A.length; j++) for (i = 0; i < A.length; i++) A[i,j] = 0; –Program 2 (1024 page faults) // Index by rows for (i = 0; i < A.length; i++) for (j = 0; j < A.length; j++) A[i,j] = 0;

53 Memory Mapped Files Disk blocks are mapped to memory pages. We read a page-sized portion of files into physical pages Reads/writes to/from files use simple memory access Access without read() and write() system calls Shared memory connects mapped memory to several processes

54 Memory-Mapped Files in Java

55 Memory-Mapped Shared Memory in Windows

56 Examples Windows NT Demand paging with clustering. Clustering loads surrounding pages Process parameters: working set minimum and working set maximum. Automatic working set trimming occurs if free memory is too low Solaris 2 Maintains list of free pages The pageout function selects victims using LRU scans. It runs more frequently if free memory is low Lotsfree controls if paging starts Scanrate controls the page scan rate, varying from slowscan to fastscan Solaris 2

57 Allocating Kernel Memory Differently from user allocation –Deals with physical memory –Kernel requests memory for structures of varying sizes –Some kernel memory must be contiguous Approaches –Buddy System Allocation –Slab Memory Allocation

58 Buddy System Allocation Allocates from a fixed-size segment of contiguous pages Memory is allocated in power-of-2 blocks –Allocation requests round up to the next power of 2 –If the kernel needs a smaller allocation than the blocks that are available, then repeatedly split a larger block into two buddies of next-lower power of 2 until the correct sized block is found. Search time = O(lg tree depth)

59 Slab Memory Allocation Slab cache –Contains of one or more slabs –A cache exists for each unique kernel data structure –Single cache for each unique kernel data structure A cache initially contains a group instantiated data structure objects The cache is initialized with objects marked as free Allocated objects are marked as used –A new Slab is added to a cache when no more free objects Benefits –No fragmentation –Fast allocation Slab: One or more physically contiguous pages

60 Slab Allocation


Download ppt "Background Information To execute –Processes must be in main memory –The CPU can only directly access main memory and registers Speed –Register access."

Similar presentations


Ads by Google