SE 292: High Performance Computing Memory Organization and Process Management Sathish Vadhiyar.

SE 292: High Performance Computing Memory Organization and Process Management
Sathish Vadhiyar

Memory Image of a Process – Structure of a process in main memory
A process consists of different sections (shown in the figure) in main memory A process also includes current activity Program counter Process registers Stack libc.so Data Memory mapped region for shared libraries Text Heap a.out Uninitialized Initialized Data Data Text Text Code

Memory Image Data section contains global variables (initialized & unitialized) The memory image also consists of shared libraries used by a program (more later)

Memory Image Stack Stack contains temporary data
Holds function parameters, return addresses, local variables Expands and contracts by function calls and returns, respectively Addresses calculated and used by compiler, relative to the top of stack, or some other base register associated with the stack Growth of stack area is thus managed by the program, as generated by the compiler

Memory Image Heap Heap is dynamically allocated
Expands and contracts by malloc & free respectively Managed by a memory allocation library

Memory Abstraction Virtual memory
Above scheme requires entire process to be in memory What if process space is larger than physical memory? Virtual memory Provides an abstraction to physical memory Allows execution of processes not completely in memory Offers many other features

Virtual Memory Concepts
Instructions must be in physical memory to be executed But not all of the program need to be in physical memory Each program could take less memory; more programs can reside simultaneously in physical memory Need for memory protection One program should not be able to access the variables of the other Typically done through address translation (more later)

Virtual Memory Concepts – Large size
Virtual memory can be extremely large when compared to a smaller physical memory page 0 page 1 page 2 page 2 page 2 Memory Map Physical Memory page v Virtual Memory

Virtual Memory Concepts - Sharing
stack stack shared library shared library heap heap data Memory Map data code code Virtual memory allows programs to share pages and libraries

Address Translation Terminology – Virtual and Physical Addresses
Virtual memory addresses translated to physical memory addresses Memory Management Unit (MMU) provides the mapping MMU – a dedicated hardware in the CPU chips Uses a lookup table (see below) To translate a virtual address to the corresponding physical address, a table of translation information is needed

Address Translation Minimize size of table by not managing translations on byte basis but larger granularity Page: fixed size unit of memory (contiguous memory locations) for which a single piece of translation information is maintained The resulting table is called page table The contents of the page table managed by OS Thus, (address translation logic in MMU + OS + page table) used for address translation

… … … Address Translation PROGRAM/PROCESS MAIN MEMORY
Virtual Address Space Physical Address Space 0x 0x Text 0x000000 0x000024: 256 Bytes 256 Bytes Virtual Page 0 Physical Page 0 0x000000FF 0x000000FF 0x0000FF 0x 0x Data 0x : 256 Bytes Virtual Page 1 … 0x000001FF 0x000001FF 0x 0x Heap … … L9 0xFFFFFF Virtual Page 0xFFFFFF Stack 0xFFFFFFFF 0xFFFFFFFF

Address Translation VPN PPN Virtual address Physical address Offset
Virtual page number (VPN) Physical Page Number (PPN) Translation table PAGE TABLE

Size of Page Table Solution – Multi-level page table (more later)
Big Problem: Page Table size Example: 32b virtual address, 16KB page size 256K virtual pages => MB page table size per process Has to be stored in memory Solution – Multi-level page table (more later) L9

Speeding up address translation
Translation Lookaside Buffer (TLB): memory in MMU that contains some page table entries that are likely to be needed soon TLB Miss: required entry not available L9

Address Translation Virtual Address (n-bit) Page Table
Base Register (PTBR) n-1 p p-1 Virtual Page Number (VPN) Virtual Page Offset (VPO) Valid Physical Page Number (PPN) Page Table m-1 p p-1 Physical Page Number (PPN) Physical Page Offset (PPO) Physical Address (m-bit) Fig (Bryant)

… … What’s happening… Page Tables Disk Main Memory P1 - - - Processes
2 - 3 1 - 4 Processes - 1 - P1 P2 Pn P2 2 2 3 4 1 3 4 - … … 4 2 3 L9 1 3 Virtual page contents 4 - 2 Pn 3 2 4 -

Demand Paging When a program refers a page, the page table is looked up to obtain the corresponding physical frame (page) If the physical frame is not present, page fault occurs The page is then brought from the disk Thus pages are swapped in from the disk to the physical memory only when needed – demand paging To support this, the page table has valid and invalid bits Valid – present in memory Invalid – present in disk

Page Fault Situation where virtual address generated by processor is not available in main memory Detected on attempt to translate address Page Table entry is invalid Must be `handled’ by operating system Identify slot in main memory to be used Get page contents from secondary memory Part of disk can be used for this purpose Update page table entry Data can now be provided to the processor L9

Demand Paging (Valid-Invalid Bit)
Fig. 9.5 (Silberschatz)

Page Fault Handling Steps
Fig. 9.6 (Silberschatz)

Page Fault Handling – a different perspective
4 Exception Page fault exception handler CPU chip 2 Cache/ Memory Disk Victim page PTEA MMU 5 1 Processor VA PTE 7 3 New page 6 Fig (Bryant)

Page Fault Handling Steps
Check page table to see if reference is valid or invalid If invalid, a trap into the operating system Find free frame, locate the page on the disk Swap in the page from the disk Replace an existing page (page replacement) Modify page table Restart instruction

Performance Impact due to Page Faults
p – probability of page fault ma – memory access time effective access time (EAT) – (1-p) x ma + p x page fault time Typically EAT = (1-p) x (200 nanoseconds) + p(8 milliseconds) = ,999,800 x p nanoseconds Dominated by page fault time

So, What happens during page fault?
Trap to OS, Save registers and process state, Determine location of page on disk Issue read from the disk major time consuming step – 3 milliseconds latency, 5 milliseconds seek Receive interrupt from the disk subsystem, Save registers of other program Restore registers and process state of this program To keep EAT increase due to page faults very small (< 10%), only 1 in few hundred thousand memory access should page fault i.e., most of the memory references should be to pages in memory But how do we manage this? Luckily, locality And, smart page replacement policies that can make use of locality

Page Replacement Policies
Question: How does the page fault handler decide which main memory page to replace when there is a page fault? Principle of Locality of Reference A commonly seen program property If memory address A is referenced at time t, then it and its neigbhouring memory locations are likely to be referenced in the near future Suggests that a Least Recently Used (LRU) replacement policy would be advantageous L9 temporal spatial

Locality of Reference Based on your experience, why do you expect that programs will display locality of reference? Same address (temporal) Neighbours (spatial) Loop Function Sequential code Loop Program L9 Local Loop index Stepping through array Data

Page Replacement Policies
FIFO – performance depends on if the initial pages are actively used Optimal page replacement – replace the page that will not be used for the longest time Difficult to implement Some ideas? LRU Least Recently Used is most commonly used Implemented using counters LRU might be too expensive to implement L9

Page Replacement Algorithms - Alternatives
LRU Approximation – Second-Chance or Clock algorithm Similar to FIFO, but when a page’s reference bit is 1, the page is given a second chance Implemented using a circular queue Counting-Based Page Replacement LFU, MFU Performance of page replacement depends on applications Section 9.4.8

Managing size of page table – TLB, multi-level tables
Translation Lookaside Buffer – a small and fast memory for storing page table entries MMU need not fetch PTE from memory every time TLB is a virtually addressed cache TLB 2 VPN PTE 3 Translation Cache/ Memory 1 4 Processor VA PA Data n-1 p+t p+t-1 p p-1 TLB tag (TLBT) TLB index (TLBI) VPO

Multi Level Page Tables
A page table can occupy significant amount of memory Multi level page tables to reduce page table size If a PTE in level 1 PT is null, the corresponding level 2 PT does not exist Only level-1 PT needs to be in memory. Other PTs can be swapped in on-demand. PTE 0 PTE 1 PTE 2 null PTE 3 null PTE 4 null PTE 5 null PTE 6 … …

Multi Level Page Tables (In general..)
Virtual Address n-1 p-1 VPN 1 VPN 2 … VPN k VPO fig (Bryant) PPN m-1 p-1 PPN PPO Physical Address Each VPN i is an index into a PT at level i Each PTE in a level j PT points to the base of some PT at level (j+1)

Virtual Memory and Fork
During fork, a parent process creates a child process Initially the pages of the parent process is shared with the child process But the pages are marked as copy-on-write. When a parent or child process writes to a page, a copy of a page is created.

Dynamic Memory Allocation
Allocator – Manages the heap region of the VM Maintains a list of “free” in memory and allocates these blocks when a process requests it Can be explicit (malloc & free) or implicit (garbage collection) Allocator’s twin goals Maximizing throughput and maximizing utilization Contradictory goals – why?

Components of Memory Allocation
The heap is organized as a free list When an allocation request is made, the list is searched for a free block Search method or placement policy First fit, best fit, worst fit Coalescing – merging adjacent free blocks Can be immediate or deferred Garbage collection – to automatically free allocated blocks no longer needed by the program

Fragmentation Cause of poor heap utilization
An unused memory is not available to satisfy allocate requests Two types Internal – when an allocated block is larger than the required payload External – when there is enough memory to satisfy a free request, but no single free block is large enough

Trashing When CPU utilization decreases, OS increases multiprogramming level by adding more processes Beyond a certain multiprogramming level, processes compete for pages leading to page faults Page fault causes disk reads by processes leading to lesser CPU utilization OS adds more processes, causing more page faults, lesser CPU utilization – cumulative effect

Process Management

Computer Organization: Software
Hardware resources of computer system are shared by programs in execution Operating System: special program that manages this sharing Ease-of-use, resource allocator, device controllers Process: a program in execution ps tells you the current status of processes Shell: a command interpreter through which you interact with the computer system csh, bash,… Lec8

Operating System, Processes, Hardware
System Calls Lec8 OS Kernel Hardware

Operating System Software that manages the resources of a computer system CPU time Main memory I/O devices OS functionalities Process management Memory management Storage management L9

Process Lifetime Two modes Implemented using mode bits
User – when executing on behalf of user application Kernel mode – when user application requests some OS service, some privileged instructions Implemented using mode bits Silberschatz – figure 1.10

Modes Can find out the total CPU time used by a process, as well as CPU time in user mode, CPU time in system mode

Shell - What does a Shell do?
while (true){ Prompt the user to type in a command Read in the command Understand what the command is asking for Get the command executed } Shell – command interpreter Shell interacts with the user and invokes system call Its functionality is to obtain and execute next user command Most of the commands deal with file operations – copy, list, execute, delete etc. It loads the commands in the memory and executes write read fork, exec Q: What system calls are involved? Lec8

System Calls How a process gets the operating system to do something for it; an interface or API for interaction with the operating system Examples File manipulation: open, close, read, write,… Process management: fork, exec, exit,… Memory management: sbrk,… device manipulation – ioctl, read, write information maintenance – date, getpid communications – pipe, shmget, mmap protection – chmod, chown When a process is executing in a system call, it is actually executing Operating System code System calls allow transition between modes Lec8

Mechanics of System Calls
Process must be allowed to do sensitive operations while it is executing system call Requires hardware support Processor hardware is designed to operate in at least 2 modes of execution Ordinary, user mode Privileged, system mode System call entered using a special machine instruction (e.g. MIPS 1 syscall) that switches processor mode to system before control transfer System calls are used all the time Accepting user’s input from keyboard, printing to console, opening files, reading from and writing to files Lec8

System Call Implementation
Implemented as a trap to a specific location in the interrupt vector (interrupting instructions contains specific requested service, additional information contained in registers) Trap executed by syscall instruction Control passes to a specific service routine System calls are usually not called directly - There is a mapping between a API function and a system call System call interface intercepts calls in API, looks up a table of system call numbers, and invokes the system calls

Figure 2.9 (Silberschatz)

Traditional UNIX System Structure
52

System Boot Bootstrap loader – a program that locates the kernel, loads it into memory, and starts execution When CPU is booted, instruction register is loaded with the bootstrap program from a pre-defined memory location Bootstrap in ROM (firmware) Bootstrap – initializes various things (mouse, device), starts OS from boot block in disk Practical: BIOS – boot firmware located in ROM Loads bootstrap program from Master Boot record (MBR) in the hard disk MBR contains GRUB; GRUB loads OS OS then runs init and waits

Process Management What is a Process? Program in execution
But some programs run as multiple processes And same program can be running multiply at same time Lec10

Process vs Program Program: static, passive, dead
Process: dynamic, active, living Process changes state with time Possible states a process could be in? Running (Executing on CPU) Ready (to execute on CPU) Waiting (for something to happen) Lec10

Process State Transition Diagram
preempted or yields CPU Running Ready scheduled waiting for an event to happen awaited event happens Lec10 Waiting

Process States Ready – waiting to be assigned to a processor
Waiting – waiting for an event Figure 3.2 (Silberschatz)

CPU and I/O Bursts Processes alternate between two states of CPU burst and I/O burst. There are a large number of short CPU bursts and small number of long CPU bursts

Process Control Block Process represented by Process Control Block (PCB). Contains: Process state text, data, stack, heap Hardware – PC value, CPU registers Other information maintained by OS: Identification – process id, parent id, user id CPU scheduling information – priority Memory-management information – page tables etc. Accounting information – CPU times spent I/O status information Process can be viewed as a data structure with operations like fork, exit etc. and the above data

PCB and Context Switch Fig. 3.4 (Silberschatz)

Process Management What should OS do when a process does something that will involve a long time? e.g., file read/write operation, page fault, … Objective: Maximize utilization of CPU Change status of process to `Waiting’ and make another process `Running’ Which process? Objective? Minimize average program execution time Fairness Lec10

Process Scheduling Selecting a process from many available ready processes for execution A process residing in memory and waiting for execution is placed in a ready queue Can be implemented as a linked list Other devices (e.g. disk) can have their own queues Queue diagram Fig. 3.7 (Silberschatz)

Scheduling Criteria CPU utilization Throughput Turnaround time
Waiting time Response time Fairness

Scheduling Policies Preemptive vs Non-preemptive
Preemptive policy: one where OS `preempts’ the running process from the CPU even though it is not waiting for something Idea: give a process some maximum amount of CPU time before preempting it, for the benefit of the other processes CPU time slice: amount of CPU time allotted In a non-preemptive process scheduling policy, process would yield CPU either due to waiting for something or voluntarily Lec10

Process Scheduling Policies
Non-preemptive First Come First Served (FCFS) Shortest Process Next Preemptive Round robin Preemptive Shortest Process Next (shortest remaining time first) Priority based Process that has not run for more time could get higher priority May even have larger time slices for some processes L11

Example: Multilevel Feedback
Used in some kinds of UNIX Multilevel: Priority based (preemptive) OS maintains one readyQ per priority level Schedules from front of highest priority non-empty queue Feedback: Priorities are not fixed Process moved to lower/higher priority queue for fairness L11

Context Switch When OS changes process is currently running on CPU
Takes some time, as it involves replacing hardware state of previously running process with that of newly scheduled process Saving HW state of previously running process Restoring HW state of scheduled process Amount of time would help in deciding what a reasonable CPU timeslice value would be Lec10

Time: Process virtual and Elapsed
Real time P1 P2 P3 P1 P3 Wallclock time Elapsed time P1 P1 Process P1 virtual time L11 Process P1 virtual time : Running in user mode : Running in system mode

How is a Running Process Preempted?
OS preemption code must run on CPU How does OS get control of CPU from running process to run its preemption code? Hardware timer interrupt Hardware generated periodic event When it occurs, hardware automatically transfers control to OS code (timer interrupt handler) Interrupt is an example of a more general phenomenon called an exception L11

Exceptions Certain exceptional events during program execution that are handled by processor HW Two kinds of exceptions Traps Page fault, Divide by zero, System call Interrupts Timer, keyboard, disk : Synchronous, software generated L11 : Asynchronous, hardware generated

What Happens on an Exception
Hardware Saves processor state Transfers control to corresponding piece of OS code, called the exception handler Software (exception handler) Takes care of the situation as appropriate Ends with return from exception instruction Hardware (execution of RFE instruction) Restores the saved processor state Transfers control back to saved PC value L11

Re-look at Process Lifetime
Which process has the exception handling time accounted against it? Process running at time of exception All interrupt handling time while process is in running state is accounted against it Part of `running in system mode’ L11

More…

SE 292: High Performance Computing Memory Organization and Process Management Sathish Vadhiyar.

Similar presentations

Presentation on theme: "SE 292: High Performance Computing Memory Organization and Process Management Sathish Vadhiyar."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

SE 292: High Performance Computing Memory Organization and Process Management Sathish Vadhiyar.

Similar presentations

Presentation on theme: "SE 292: High Performance Computing Memory Organization and Process Management Sathish Vadhiyar."— Presentation transcript:

Similar presentations

About project

Feedback