Chapter 9: Main Memory.

Chapter 9: Main Memory

Chapter 9: Memory Management
Background Contiguous Memory Allocation Paging Structure of the Page Table Swapping Example: The Intel 32 and 64-bit Architectures Example: ARMv8 Architecture

Objectives To provide a detailed description of various ways of organizing memory hardware To discuss various memory-management techniques To provide a detailed description of the Intel Pentium, which supports both pure segmentation and segmentation with paging

Background The main purpose of a computer system is to execute programs These programs (code + data), must be at least partially in Main Memory (RAM) during execution Modern computer systems maintain several processes in memory during system execution Many memory-management schemes exist, reflecting various approaches, and the effectiveness of each algorithm varies with the situation As a result of CPU scheduling, we can improve both the utilization of the CPU and the speed of the computer’s response to its users. To realize this increase in performance, however, we must keep many processes in memory—that is, we must share memory Memory consists of a large array of bytes, each with its own address The CPU fetches instructions from memory according to the value of the program counter (PC) Selection of a memory-management scheme depends on many factors Most algorithms require some form of hardware support.

Background Program must be brought (from disk) into memory (RAM) and placed within a process for it to be run CPU can access directly only Main Memory (RAM) and Registers Memory unit only sees a stream of: addresses + read requests, or address + data and write requests Register access is done in one CPU clock (or less) Main Memory can take many cycles, causing a stall :( Cache sits between main memory and CPU registers Protection of memory is required to ensure correct operation

Protection Need to ensure that a process can access only those addresses in its address space We can provide this protection by using a pair of base and limit registers define the logical address space of a process

Hardware Address Protection
CPU must check every memory access generated in user mode to be sure it is between base and limit for that user The instructions to loading the base and limit registers are privileged, (What does this mean? allowed only to the OS kernel)

Address Binding Programs on disk, ready to be brought into memory to execute form an input queue Without support, must be loaded into address 0000 Inconvenient to have first user process physical address always at 0000 How can it not be ?? Addresses represented in different ways at different stages of a program’s life Source code addresses: usually symbolic (??) Compiled code addresses: bind to relocatable addresses i.e. “14 bytes from beginning of this module” Linker or loader: will bind relocatable addresses to absolute addresses i.e Each binding maps one address space to another

Binding of Instructions and Data to Memory
Address binding of instructions and data to memory addresses can happen at three different stages Compile time: If memory location is known a priori, then absolute code can be generated Cons: must recompile code if starting location changes Load time: if memory location is not known at compile time Must generate relocatable code Execution time: Binding delayed until run time if the process can be moved during its execution from one memory segment to another Need hardware support for address maps (e.g., base and limit registers)

Multistep Processing of a User Program

Logical vs. Physical Address Space
The concept of a logical address space that is bound to a separate physical address space is central to proper memory management Logical address – generated by the CPU; also referred to as virtual address Physical address – address seen by the memory unit Logical and physical addresses are the same in compile-time and load-time address-binding schemes; logical (virtual) and physical addresses differ in execution-time address-binding scheme Logical address space is the set of all logical addresses generated by a program Physical address space is the set of all physical addresses generated by a program

Memory-Management Unit (MMU)
Hardware device that at run time maps virtual to physical address Many methods possible, covered in the rest of this chapter The memory unit sees only a stream of memory addresses; It does not know how they are generated (by the instruction counter, indexing, indirection, literal addresses, and so on) or What they are for (instructions or data)

Memory-Management Unit (MMU) (Cont.)
Consider simple scheme: which is a generalization of the base-register scheme The base register now called relocation register The value in the relocation register is added to every address generated by a user process at the time it is sent to memory The user program deals with logical addresses; it never sees the real physical addresses Execution-time binding occurs when reference is made to location in memory Logical address bound to physical addresses

Memory-Management Unit (Cont.)
Consider simple scheme. which is a generalization of the base-register scheme. The base register now called relocation register The value in the relocation register is added to every address generated by a user process at the time it is sent to memory

Dynamic Loading The entire (complete executable) program does need to be in memory to execute Routine (a function or procedure) is not loaded until it is called Better memory-space utilization; unused routine is never loaded All routines are kept on disk in relocatable load format Useful when large amounts of code are needed to handle infrequently occurring cases No special support from the operating system is required Implemented through program design OS can help by providing libraries to implement dynamic loading

Dynamic Linking Static linking – system libraries and program code combined by the loader into the binary program image (one binary file with all the libraries) Dynamic linking – linking is postponed until execution time Small piece of code, stub, used to locate the appropriate memory-resident library routine Stub replaces itself with the address of the routine, and executes the routine Operating system checks if routine is in processes’memory address If not in address space, add to address space Dynamic linking is particularly useful for libraries known as shared libraries (*.so or *.dll) Consider applicability to patching system libraries Versioning may be needed

Contiguous Allocation
Main memory must support both OS and user processes Limited resource, must allocate efficiently Contiguous allocation is one early method The memory is usually divided into two partitions: one for the operating system and one for the user processes. We can place the operating system in either low memory addresses or high memory addresses. Decision depends on many factors, such as the location of the interrupt vector However, many operating systems (including Linux and Windows) place the OS in high memory. Main memory usually divided into two partitions: Resident operating system, User processes then held in available memory Each process contained in single contiguous section of memory Several user processes reside in memory at the same time

Contiguous Allocation (Cont.)
We can prevent a process from accessing memory that it does not own by: Relocation Register together with a Limit Register Relocation registers used to protect user processes from each other, and from changing OS code and data Base register contains value of smallest physical address Limit register contains range of logical addresses – each logical address must be less than the limit register MMU maps logical address dynamically Can then allow actions such as kernel code being transient and kernel changing size

Hardware Support for Relocation and Limit Registers

Memory Allocation: Variable Partition
Multiple-partition allocation keeps a table indicating which parts of memory are available vs. occupied Degree of multiprogramming limited by number of partitions Variable-partition sizes for efficiency (sized to a given process’ needs) Hole – block of available memory; holes of various size are scattered throughout memory When a process arrives, it is allocated memory from a hole large enough to accommodate it Process exiting frees its partition, adjacent free partitions combined OS maintains information about: a) allocated partitions b) free partitions (hole)

Dynamic Storage-Allocation Problem
How to satisfy a request of size n from a list of free holes? First-fit: Allocate the first hole that is big enough Best-fit: Allocate the smallest hole that is big enough; must search entire list, unless ordered by size Produces the smallest leftover hole Worst-fit: Allocate the largest hole; must also search entire list Produces the largest leftover hole First-fit and best-fit better than worst-fit in terms of speed and storage utilization

Fragmentation Memory fragmentation can be internal as well as external. External Fragmentation – total memory space exists to satisfy a request, but it is not contiguous (storage is fragmented into a large number of small holes), what should we do ? If all these small pieces of memory were in one big free block instead, we might be able to run several more processes. Internal Fragmentation – allocated memory may be slightly larger than requested memory; this size difference is memory internal to a partition, but not being used; —unused memory that is internal to a partition Both the first-fit and best-fit strategies for memory allocation suffer from external fragmentation. Statistical analysis of first fit, for instance, reveals that, even with some optimization, given N allocated blocks, another 0.5 N blocks will be lost to fragmentation. That is, one-third of memory may be unusable! This property is known as the 50-percent rule (50%).

Fragmentation (Cont.) Reduce external fragmentation by compaction (compression) Shuffle memory contents to place all free memory together in one large block Compaction is what happens when something is crushed or compressed Compaction is possible only if relocation is dynamic, and is done at execution time; we must determine its cost ! I/O problem Move job in memory while it is involved in I/O Do I/O only into OS buffers Now consider that backing store (disk) has same fragmentation problems

Paging A memory management scheme that permits a process’s physical address space to be non-contiguous The most common memory-management technique for computer systems from those for large servers through those for mobile devices Avoids external fragmentation and the associated need for compaction Allowing a process to be allocated physical memory wherever such memory is available permit the logical address space of processes to be noncontiguous implemented through cooperation between the operating system and the computer hardware. Physical address space of a process can be noncontiguous too process is allocated physical memory whenever the latter is available Avoids external fragmentation Avoids problem of varying sized memory chunks

How Paging are Implemented
Divide physical memory into fixed-sized blocks called frames (or pages) Size is power of 2, between 512 bytes and 16 Mbytes Divide logical memory into blocks of the same size called pages There is a need to Keep track of all free frames When a process is to be executed, its pages are loaded into any available memory frames from their source (a file system or the backing store, - i.e. HD disk). To run a program of size N pages, need to find N free frames and load program (to any available memory frames) Now, the logical address space is totally separate from the physical address space i.e. A process can have a logical 64-bit address space even though the system has less than 264 bytes of physical memory Set up a page table to translate logical to physical addresses Backing store likewise split into pages Is it Still possible to have Internal Fragmentation ???

Address Translation Scheme
Every address generated by CPU is divided into two parts: Page Number (p) – used as an index into a per-process page table which contains base address of each page in physical memory Page Offset (d) – combined with base address to define the physical memory address that is sent to the memory unit The page size (like the frame size) is defined by the hardware Page size is a power of 2, typically varying between 4 KB and 1 GB per page If the size of the logical address space is 2m, and a page size is 2nbytes, then the high-order m−n bits of a logical address specify the page number, and the n low-order bits specify the page offset For given logical address space 2m and page size 2n p is an index into the page table and d is the displacement within the page

Paging: Address Translation Scheme
For example, the logical address space is now totally separate from the physical address space. So a process can have a logical 64-bit address space even though the system has less than 264 bytes of physical memory. The page number is used as an index into a per-process page table. The page table contains the base address of each frame in physical memory, and the offset is the location in the frame being referenced Thus, the base address of the frame is combined with the page offset to define the physical memory address.

Paging Hardware Steps taken by the MMU to translate a logical address generated by the CPU to a physical address: Extract the page number p and use it as an index into the page table Extract the corresponding frame number f from the page table Replace the page number p in the logical address with the frame number f As the offset d does not change, it is not replaced, and the frame number and offset now comprise the physical address. 

Example: Paging Model of Logical and Physical Memory
You may have noticed that paging itself is a form of dynamic relocation. Every logical address is bound by the paging hardware to some physical address. With the use a paging scheme No external fragmentation May have some internal fragmentation In the worst case, a process would need n pages plus 1 byte. It would be allocated n + 1 frames, resulting in internal fragmentation of almost an entire frame. we expect internal fragmentation to average one-half page per process. Thus small page sizes are desirable? Disk I/O is more efficient when the data being transferred is larger?? Trade off :( Today, pages are typically either 4 KB or 8 KB in size, and some systems support even larger page sizes, try out the getconf PAGESIZE

Minuscule Paging Example
Page size of 4 bytes and a Logical address: m = 4 and n = 2; In other words: p = 2, d = 2 Physical memory of 32 bytes (8 pages ?) Indexing into the page table, we find that page 0 is in frame 5. Thus, Logical address 0 maps to physical address 20 [= (5 × 4) +0]. Logical address 3 (page 0, offset 3) maps to physical address 23 [= (5 × 4) + 3]. Logical address 4 (page 1, offset 0); according to the page table, page 1 is mapped to frame 6. Thus, logical address 4 maps to physical address 24 [= (6 × 4) + 0]. Logical address 13 maps to physical address 9. Frame 0 Frame 1 Frame 2 Frame 3 Frame 4 Frame 5 Frame 6 Frame 7 Page 0 Page 1 Page 2 Page 3

Calculating Internal Fragmentation
When we use a paging scheme, we have no external fragmentation: any free frame can be allocated to a process that needs it. However, we may have some internal fragmentation. Notice that frames are allocated as units. If the memory requirements of a process do not happen to coincide with page boundaries, the last frame allocated may not be completely full. Example of Internal Fragmentation: Page size = 2,048 bytes, Process size = 72,766 bytes 35 pages + 1,086 bytes Internal fragmentation of 2, ,086 = 962 bytes Worst case fragmentation = 1 frame – 1 byte On average fragmentation = 1/2 frame size

Paging Today, pages are typically either 4 KB or 8 KB in size. However, Some systems support even larger page sizes. Some CPUs and operating systems even support multiple page sizes. For instance, on x86-64 systems, Windows 10 supports page sizes of 4 KB and 2 MB. Linux also supports two page sizes: a default page size (typically 4 KB) and an architecture dependent larger page size called huge pages. (try: getconf PAGESIZE) If process size is independent of page size, we expect internal fragmentation to average one-half page per process. So small frame sizes desirable? But each page table entry takes memory to track ?? Page sizes growing over time i.e. Solaris supports two page sizes – 8 KB and 4 MB

Example Frequently, on a 32-bit CPU:
Each page-table entry is 4 bytes long (32-bit), A 32-bit entry can point to one of 232 physical frames. If the frame size is 4 KB (212), then a system with 4-byte entries can address 244 bytes (or 244 = 16 TB = 232 * 212) of physical memory. Note: the size of physical memory is typically different from the maximum logical size of a process Other information are needed to be kept in the page-table entries However, that information reduces the number of bits available to address page frames. Thus, a system with 32-bit page-table entries may address less physical memory than the maximum possible When a process arrives in the system to be executed, Its size, expressed in pages, Each page of the process needs one frame If the process requires n pages, then at least n frames must be available in memory. !!! In fact, the user program is scattered throughout physical memory, which also holds other programs The logical addresses are translated into physical addresses This mapping is hidden from the programmer and is controlled by the OS The user process by definition is unable to access memory it does not own It has no way of addressing memory outside of its page table

Free Frames Before allocation After allocation
OS is managing physical memory: It must be aware of the allocation details of physical memory— which frames are allocated, which frames are available, how many total frames are there, etc. This information is generally kept in a single, System-wide data structure called a frame table. This frame table has one entry for each physical frame, indicating whether the frame is free or allocated and, if it is allocated, to which page of which process (or processes). The OS maintains a copy of the page table for each process, It is used to translate logical addresses to physical addresses Before allocation After allocation

Implementation of Page Table (Hardware Support)

Implementation of Page Table
As page tables are per-process data structures, a pointer to the page table is stored with other values in the PCB of each process Paging therefore increases the context-switch time (page-table must be included) Page table is kept in main memory Page-Table Base Register (PTBR) points to the page table Page-Table Length Register (PTLR) indicates size of the page table Changing page tables requires changing only this one register, substantially reducing context-switch time In this scheme every data/instruction access requires 2 memory accesses One for the page table (PTBR), then obtained frame number is used to get the actual address (data/instruction) (Slow by a factor of 2), the 2 memory access problem can be solved by Special Fast-Lookup hardware cache called Translation Look-aside Buffers (TLBs) (also called associative memory). Each entry in the TLB consists of two parts: a key (or tag) and a value The search compares all keys simultaneously; The search is fast; no performance penalty

Hardware TLBs are a hardware feature, typically small (64 to 1,024 entries) TLB contains a few of the page-table entries. When a logical address is generated by the CPU, MMU first checks if its page number is present in the TLB. If found, its frame number is immediately available and used On a TLB miss, value is loaded into the TLB for faster access next time Replacement policies must be considered (i.e. LRU entry replacement, etc.) Some entries can be wired down for permanent fast access (i.e. key kernel code) Associative memory – parallel search Address translation (p, d) If p is in associative register, get frame # out Otherwise get frame # from page table in memory

Translation Look-aside Buffer (TLB)
Some TLBs store Address-Space IDentifiers (ASIDs) in each TLB entry – uniquely identifies each process Provides address-space protection for that process Ensures that the ASID for the currently running process matches the ASID associated with the virtual page. If the ASIDs do not match, the attempt is treated as a TLB miss. Allows the TLB to contain entries for several different processes simultaneously. If the TLB does not support separate ASIDs, then every time a new page table is selected (for instance, with each context switch), the TLB must be flushed (erased) In other words, need to flush at every context switch (Which is not the case) For Example: Intel Core i7 CPU has a 128-entry L1 instruction TLB and a 64-entry L1 data TLB. In the case of a miss at L1, it takes the CPU six cycles to check for the entry in the L2 512-entry TLB. A miss in L2 means that the CPU must either 1) walk through the page-table entries in memory to find the associated frame address, which can take hundreds of cycles, or 2) interrupt to the operating system to have it do the work.

Paging Hardware With TLB

Effective Access Time (EAT)
The percentage of times that the page number of interest is found in the TLB is called the hit ratio. An 80% hit ratio means that we find the desired page number in the TLB 80% of the time. Example: suppose that 10 ns (nanoseconds) to access mapped-memory. If we find the desired page in TLB, then a mapped-memory access take 10 ns Otherwise we need two memory access so it is 20 ns Effective Access Time (EAT) EAT = 0.80 x x 20 = 12 ns implying 20% slowdown in access time (from 10 to 12 nanoseconds). Consider a more realistic hit ratio of 99%, EAT = 0.99 x x 20 = 10.1 ns This increased hit rate implying only 1% slowdown in access time.

Memory Protection Memory protection implemented by associating protection bit with each frame to indicate if read-only or read-write access is allowed (i.e. kept in the page table) Every reference to memory goes through the page table to find the correct frame number Can check to verify that no writes are being made to a read-only page Can be extended to add more bits to indicate page execute-only, and so on Valid-Invalid bit attached to each entry in the page table: “valid” indicates that the associated page is in the process’logical address space, and is thus a legal page (or valid page) “invalid” indicates that the page is not in the process’logical address space (an illegal page, or invalid page) Or use Page-Table Length Register (PTLR) Any violations result in a trap to the kernel

Valid (v) or Invalid (i) Bit In A Page Table

Shared Pages An advantage of paging is the possibility of sharing common code, A consideration that is particularly important in an environment with multiple processes. i.e. the standard C library, which provides a portion of the system call interface for many versions of UNIX and Linux (libc). If a system has 40 user processes, and the libc library is 2 MB. Allowing each process to load its own copy of libc into its address space, would require 80 MB of memory :-). Shared code One copy of read-only (reentrant) code shared among processes To be sharable, the code must be reentrant. Reentrant code is non-self-modifying code, it never changes during execution. (i.e., text editors, compilers, window systems, etc.) Similar to multiple threads sharing the same process space Also useful for inter-process communication if sharing of read-write pages is allowed Private code and data Each process keeps a separate copy of the code and data The pages for the private code and data can appear anywhere in the logical address space

Shared Pages Example Although the figure shows the libc library occupying 4 pages, in reality, it would occupy more The read-only nature of shared code should not be left to the correctness of the code; OS should enforce this property

Structure of the Page Table
Hierarchical Paging Hashed Page Tables Inverted Page Tables

Structure of the Page Table
Most modern computer systems support a large logical address space (232 to 264). In such an environment, the page table itself becomes excessively large :-) Memory structures for paging can get huge using straight-forward methods, For example: Consider a 32-bit logical address space as on modern computers Page size of 4 KB (212) Page table would have 1 million entries (220 = 232 / 212) Assuming that each entry consists of 4 bytes, each process may need up to 4 MB of physical address space for the page table alone. (Part of RAM or Main Memory) Don’t want to allocate that contiguously in main memory !!! One simple solution is to divide the page table into smaller units Hierarchical Paging (i.e. two-level and multi-level paging algorithm) Hashed Page Tables Inverted Page Tables

1. Two-Level Paging Example
One way is to use a two-level paging algorithm, in which the page table itself is also paged, For example: A logical address (on 32-bit machine with 4KB page size) is divided into: a page number consisting of 20 bits a page offset consisting of 12 bits (4K = 212) Since the page table itself is paged, the page number is further divided into: a 10-bit page number (the outer page table) a 10-bit page offset within the outer page table itself (the inner page table) Thus, a logical address is as follows: where p1 is an index into the outer page table, and p2 is the displacement within the page of the inner page table Known as forward-mapped page table (from the outer page table inward)

Hierarchical Page Tables
Break up the logical address space into multiple page tables A simple technique is a two-level page table We then page the page-table Address-Translation Scheme

64-bit Logical Address Space
With the 64-bit Logical Address Space, even the two-level paging scheme is no longer appropriate If page size is 4 KB (212) Then page table has 252 entries If two level scheme, inner page tables could be 210 entries Address would look like this: Outer page table has 242 entries; or 244 bytes (on a 4-byte entry) One solution is to add a 2nd outer page table But in the following example, the 2nd outer page table is still 234 bytes in size and possibly 4 memory access to get to one physical memory location

Three-level Paging Scheme
The obvious way to avoid such a large table is to divide the outer page table into smaller pieces The outer page table is still 234 bytes (16 GB) in size. For example, we can page the outer page table, giving us a three-level paging scheme. However, the outer page table is still 234 bytes (16 GB) in size. The 64-bit UltraSPARC would require 7 levels of paging— a prohibitive number of memory accesses to translate each logical address. You can see from this example why, for 64-bit architectures, hierarchical page tables are generally considered inappropriate.

Hashed Page Tables Common in address spaces larger than 32 bits
The virtual page number is hashed into a page table hash value being the virtual page number This page table contains a chain (linked list) of elements that hash to the same location (to handle collisions) Each element contains of three fields: (1) the virtual page number (2) the value of the mapped frame, and (3) a pointer to the next element in the linked list Algorithm (How it works): Virtual page numbers are compared in this chain searching for a match If a match is found, the corresponding page frame (field 2) is used to form the desired physical address. If no match, subsequent entries in the linked list are searched Variation for 64-bit addresses is using a clustered page tables Similar to hashed but each entry refers to several pages (such as 16) rather than 1 Especially useful for sparse address spaces (where memory references are non-contiguous and scattered)

2. Hashed Page Table

3. Inverted Page Table Usually, each process has an associated page table The page table has one entry for each page that the process is using real page or physical frame one slot for each virtual address, regardless of the validity. The OS must then translate each virtual address into a physical memory address Cons: A page table may consist of millions of entries end up consuming a large amounts of physical memory just to keep track of how other physical memory is being used

Inverted Page Table Rather than each process having a page table and keeping track of all possible logical pages, Instead track all physical pages One entry for each real page (or frame) of memory Each entry consists of the virtual address of the page stored in that real memory location, include information about the process that owns that page Thus, only one page table is in the system, and it has only one entry for each page of physical memory. Because the inverted page table is sorted by physical address, but lookups occur on virtual addresses, the whole table might need to be searched before a match is found. Decreases memory needed to store each page table, but increases time needed to search the table when a page reference occurs Use hash table to limit the search to one — or at most a few — page-table entries so one virtual memory reference requires at least two real memory reads TLB can accelerate access, But how to implement shared memory? If there is only one mapping of a virtual address to the shared physical address

Inverted Page Tables – Issues
One interesting issue with inverted page tables involves shared memory. With standard paging, each process has its own page table which allows multiple virtual addresses to be mapped to the same physical address. This method cannot be used with inverted page tables; because there is only one virtual page entry for every physical page, one physical page cannot have two (or more) shared virtual addresses. Therefore, with inverted page tables, only one mapping of a virtual address to the shared physical address may occur at any given time. Solution: Inverted page tables often require that an Address-Space IDentifier (ASID) stored in each entry of the page table A reference by another process sharing the memory will result in a page fault and will replace the mapping with a different virtual address.

Inverted Page Table Architecture
i.e. each virtual address in the system consists of a triple: <process-id, page-number, offset>. Each inverted page-table entry is a pair <process-id, page-number> Then the physical address <i, offset> is generated

4. Oracle SPARC Solaris Two Hash Tables
Examples of systems using inverted page tables include: the 64-bit UltraSPARC and PowerPC (old macOS machines). Consider modern, 64-bit operating system example with tightly integrated HW Goals are efficiency, low overhead Based on hashing, but more complex Two Hash Tables One for the kernel and one for all user processes Each maps memory addresses from virtual to physical memory Each entry represents a contiguous area of mapped virtual memory, More efficient than having a separate hash-table entry for each page Each entry has base address and span (indicating the number of pages the entry represents)

Oracle SPARC Solaris (Cont.)
Virtual-to-physical translation would take too long if each address required searching through a hash table, So the CPU implements a TLB TLB holds Translation Table Entries (TTEs) for fast hardware lookups A cache of TTEs reside in a Translation Storage Buffer (TSB) Includes an entry per recently accessed page When a virtual address reference occurs, the hardware searches the TLB for a translation. If none is found in the TLB, the hardware walks through the in-memory TSB looking for the TTE that corresponds to the virtual address that caused the lookup. This TLB walk functionality is found on many modern CPUs. If a match is found in the TSB, the CPU copies the TSB entry into the TLB, and the memory translation completes. If no match is found in the TSB, the kernel is interrupted to search the hash table. Creates a TTE from the appropriate hash table and stores it in the TSB and then into the TLB

Swapping Standard Swapping Swapping with Paging
Swapping on Mobile Systems

Swapping Process instructions and the data they operate on must be in memory to be executed. However, a process, or a portion of a process, can be swapped temporarily out of memory to a backing store and then brought back into memory for continued execution. Swapping makes it possible for the total physical address space of all processes to exceed the real physical memory of the system Total physical memory space of processes can exceed physical memory Thus, increasing the degree of multiprogramming in a system. The system can accommodate more processes than there is actual physical memory to store them

Swapping Backing store – Roll out, roll in –
fast disk that is large enough to accommodate whatever parts of processes need to be stored and retrieved must provide direct access to these memory images, (commonly fast secondary storage) Roll out, roll in – Idle or mostly idle processes are good candidates for swapping Swapping variant used for priority-based scheduling algorithms Lower-priority process is swapped out so higher-priority process can be loaded and executed Major part of swap-time is transfer-time; total transfer time is directly proportional to the amount of memory swapped System maintains a ready queue of ready-to-run processes which have memory images on disk The OS must maintain metadata for processes that have been swapped out, so they can be restored when they are swapped back in to memory

Swapping Issues (Cont.)
Does the swapped out process need to swap back into the same physical addresses? Depends on address binding method Plus consider pending I/O to/from process memory space Modified versions of swapping are found on many systems (i.e., UNIX, Linux, and Windows), in which pages of a process— rather than an entire process —can be swapped. Swapping normally disabled Started if more than threshold amount of memory allocated Disabled again once memory demand reduced below threshold

Context Switch Time including Swapping
If next processes to be put on CPU is not in memory, need to swap out a process and swap in target process Context switch time can then be very high 100 MB process swapping to hard disk with transfer rate of 50 MB/sec Swap out time of 2 sec (or 2000 ms or millisecond) Plus swap in of same sized process Total context switch swapping component time of 4000 ms (4 seconds) Time can be reduced if we can reduce the size of memory to be swapped – by knowing how much memory really being used System calls to inform OS of memory use via request_memory() and release_memory()

Context Switch Time and Swapping (Cont.)
Other constraints as well on swapping Pending I/O – can’t swap out as I/O would occur to wrong process Or always transfer I/O to kernel space, then to I/O device Known as double buffering, adds overhead NOTE: Standard swapping not used in modern OSs But a modified version is common Swap only when free memory extremely low

Swapping on Mobile Systems
Not typically supported Flash memory based (HD) Small amount of space Limited number of write cycles Poor throughput between flash memory and CPU on mobile platform Instead use other methods to free memory if low iOS asks apps to voluntarily relinquish (give up) allocated memory Read-only data thrown out and reloaded from flash if needed Failure to free can result in termination Android terminates apps if low free memory, but first writes application state to flash for fast restart Both OSes support paging as discussed below

Swapping with Paging

End of Chapter 9

Examples Intel 32 and 64-bit Architectures

Example: The Intel 32 and 64-bit Architectures
Dominant industry chips Pentium CPUs are 32-bit and called IA-32 architecture Current Intel CPUs are 64-bit and called IA-64 architecture Many variations in the chips, cover the main ideas here

Example: The Intel IA-32 Architecture
Supports both segmentation and segmentation with paging Each segment can be 4 GB Up to 16 K segments per process Divided into two partitions First partition of up to 8 K segments are private to process (kept in local descriptor table (LDT)) Second partition of up to 8K segments shared among all processes (kept in global descriptor table (GDT))

Example: The Intel IA-32 Architecture (Cont.)
CPU generates logical address Selector given to segmentation unit Which produces linear addresses Here, s designates the segment number, g indicates whether the segment is in the GDT or LDT, and p deals with protection. The offset is a 32-bit number specifying the location of the byte within the segment in question. Linear address given to paging unit Which generates physical address in main memory Paging units form equivalent of MMU Pages sizes can be 4 KB or 4 MB

Logical to Physical Address Translation in IA-32

Intel IA-32 Paging Architecture

Intel IA-32 Page Address Extensions
32-bit address limits led Intel to create Page Address Extension (PAE), allowing 32-bit apps access to more than 4GB of memory space Paging went to a 3-level scheme Top two bits refer to a page directory pointer table Page-directory and page-table entries moved to 64-bits in size Net effect is increasing address space to 36 bits – 64GB of physical memory

Intel x86-64 Current generation Intel x86 architecture
64 bits is ginormous (> 16 exabytes) In practice only implement 48 bit addressing Page sizes of 4 KB, 2 MB, 1 GB Four levels of paging hierarchy Can also use PAE so virtual addresses are 48 bits and physical addresses are 52 bits

Example: ARM Architecture
Dominant mobile platform chip (Apple iOS and Google Android devices for example) Modern, energy efficient, 32-bit CPU 4 KB and 16 KB pages 1 MB and 16 MB pages (termed sections) One-level paging for sections, two-level for smaller pages Two levels of TLBs Outer level has two micro TLBs (one data, one instruction) Inner is single main TLB First inner is checked, on miss outers are checked, and on miss page table walk performed by CPU

Chapter 9: Main Memory.

Similar presentations

Presentation on theme: "Chapter 9: Main Memory."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 9: Main Memory.

Similar presentations

Presentation on theme: "Chapter 9: Main Memory."— Presentation transcript:

Similar presentations

About project

Feedback