Chapter 1: Introduction 9/4/03 - nmg

Chapter 1: Introduction 9/4/03 - nmg
What is an Operating System? Mainframe Systems Desktop Systems Multiprocessor Systems Distributed Systems Clustered System Real -Time Systems Handheld Systems Computing Environments NOTE: Instructor annotations are in BLUE Operating System Concepts

What is an Operating System?
A program that acts as an intermediary between a user of a computer and the computer hardware. Operating system goals: Execute user programs and make solving user problems easier. Make the computer system convenient to use. Use the computer hardware in an efficient manner. … OS problems parallel many problems in every day life: Queues, semaphores (think Navy!), buffers, traffic control, overlapping tasks, coordinating multiple related tasks (one at a time in the restroom!), deadlock vs gridlock in traffic, scheduling problems … Operating System Concepts

Computer System Components
Hardware – provides basic computing resources (CPU, memory, I/O devices). 1A. Also firmware (micro-code). 2. Operating system – controls and coordinates the use of the hardware among the various application programs for the various users. 3. Applications programs – define the ways in which the system resources are used to solve the computing problems of the users (compilers, database systems, video games, business programs). 4. Users (people, machines, other computers). Operating System Concepts

Operating Systems Views
Fundamental goal of computer system is to execute user programs & make problem solving easier. … maybe goal should be extended for modern systems: communication, access remote data, … In general, no completely adequate definition of OS – a dynamic evolving entity Three basic components of an OS - Process control/management - Memory management - I/O and files system control System/administrator vs. user view sometimes in conflict -efficiency vs. ease of use Operating System Concepts

Abstract View of System Components
Operating System Concepts

Operating System Definitions
Resource allocator – manages and allocates resources – deals (hopefully!) with deadlock. Control program – controls the execution of user programs and operations of I/O devices . Kernel – the one program running at all times (all else being application programs). Operating System Concepts

Mainframe Systems Historical View
1960’s: Batch jobs with similar needs, leave the job with an operator, pick up results (hard copy listing) the next+ day – Hope no syntax errors! (“thin” listing) – Thick listing is a runtime error (Hex memory dump) - non-interactive Reduce setup time by batching similar jobs Automatic job sequencing – automatically transfers control from one job to another. First rudimentary operating system. Resident monitor initial control in monitor control transfers to job when job completes control transfers pack to monitor Comment on use of “JCL” and how applications jobs were processed in early “batch days” Operating System Concepts

Memory Layout for a Simple Batch System

Multiprogrammed Batch Systems
Several jobs are kept in main memory at the same time, and the CPU is multiplexed among them. Operating System Concepts

OS Features Needed for Multiprogramming
I/O services provided by the system. I/O done in “kernel mode” via device drivers May run concurrently with application - if application must wait for I/O, then OS will allow another job to run - maximize CPU and I/O utilization - lots on this later. Memory management – the system must allocate the memory to several jobs. CPU scheduling and dispatching – the system must choose among several jobs ready to run - must minimize context switching time - a performance bottleneck. Allocation of devices - to “resident” jobs - danger of deadlock. Operating System Concepts

Time-Sharing Systems–Interactive Computing
The CPU is time multiplexed among several jobs that are kept in memory and on disk (the CPU is allocated to a job only if the job is in memory). A “job” could now be a user ==> many users on a single machine A job swapped in and out of memory to the disk - to free up memory for a higher priority job, or if it has a long wait for I/O to complete - more on this later. Comment: A program in “some state of execution” (having been already loaded in memory, but not completed) is known as a process. On-line communication between the user and the system is provided; when the operating system finishes the execution of one command, it seeks the next “control statement” from the user’s keyboard. On-line system must be available for users to access data and code - presents an interactive user interface - not just batch. Operating System Concepts

Desktop Systems Personal computers – computer system dedicated to a single user. I/O devices – keyboards, mice, display screens, small printers. User convenience and responsiveness. Can adopt technology developed for larger operating system’ often individuals have sole use of computer and do not need advanced CPU utilization of protection features. May run several different types of operating systems (Windows, MacOS, UNIX, Linux) Forerunner of the RISC Workstation: ex: SUN or IBM RISC/6000 work stations - the PC would revolutionize computing - some companies (guess who!) were blind to this - others (SUN) had the foresight to recognize this. Operating System Concepts

Parallel (Tightly Coupled ) Systems
Multiprocessor systems with more than on CPU in close communication. Tightly coupled system – processors share memory and a clock; communication usually takes place through the shared memory – via a bus - . Advantages of parallel (Tightly Coupled ) systems: Increased throughput Increased computing power (speed-up factor) Economical Increased reliability graceful degradation fail-soft systems OS functions for multiprocessor systems are significantly more complex Operating System Concepts

Parallel (Tightly Coupled ) Systems (Cont.)
Shared memory – tightly coupled schemes: SMP and Asymmetric Symmetric multiprocessing (SMP) The common kernel in shared memory could operate on any processor – process/thread parallism on kernel execution possible – processors are peers – no master/slave. Many processes can run at once without performance deterioration - true parallelism vs pseudo parallelism of a multitasking system on a uniprocessor. Most modern operating systems support SMP Asymmetric multiprocessing Each processor is assigned a specific task; master processor schedules and allocated work to slave processors. More common in extremely large systems Problem with parallelism: how do you distribute a problem across multiple processes to capture the max potential of the system? Can all algorithms be “parallelized”? Are there theoretical limits to parallelizing? - see also loosely coupled. Example logic simulators - a natural for parallelism. Operating System Concepts

Flynn’s Classification
SISD – Single Instruction, Single data stream Basic Uniprocessor – single program counter SIMD - Single Instruction, Multiple data stream A logically single stream of instructions operating on different units of data in parallel – ex. A vector processor Example of an implementation: a single stream of SIMD instructions from a single program counter in a special SISD host processor are broadcasted to many parallel SIMD processors each with its own registers and cache memory. Each of the SIMD processors now executes the same instruction on a different unit of data in parallel lock step synchronism. Example: the CM-2 “Super Computer” with 65,563 processors, each having a 1 bit ALU (32 way bit slicing?) MISD - Multiple Instruction, Single data stream – sequence of different data broadcasted to different parallel processors, each executing a different instruction sequence. Not ever implemented. MIMD - Multiple Instruction, Multiple data stream – many parallel processors executing different instruction streams on different data items. Commonly implemented with “loosely couples” clusters of general purpose computers on a network (see later) and also tightly coupled SMP. Operating System Concepts

Parallel (Tightly Coupled ) Systems (Cont.)
From Stallings, “Operating Systems”, 4th ed. Operating System Concepts

Symmetric Multiprocessing Architecture
From Stallings, “Operating Systems”, 4th ed (replaced original diagram) Operating System Concepts

Distributed Systems (Loosely Coupled)
Distribute the computation among several physical processors. Loosely coupled system (clusters?) – each processor has its own local memory; processors communicate with one another through various communications lines, such as high-speed buses , cross-bar switches, LANS, or telephone lines. Could be a heterogeneous mixture of independent computers having different characteristics etc. all connected on by some network fabric. Advantages of distributed systems. Resources Sharing Computation speed up – load sharing Reliability Communications Disadvantages: control and OS functions complicated, and distributing an algorithm over the nodes is difficult. Operating System Concepts

Distributed Systems (cont)
Requires networking infrastructure. Local area networks (LAN) or Wide area networks (WAN) May be either client-server or peer-to-peer systems. Clients generate requests to be satisfied by the server – server performs computation with results sent to client Peer-to-peer example: Internet or a master and many slaves on a network or switch. Operating System Concepts

General Structure of Client-Server

Clustered Systems Alternative to SMP
Goal is high reliability,availability, and performance. A group of interconnected, “whole” computers working together as a unified computing resource that can create the illusion of being a single machine. Clustering allows two or more systems to share (secondary?) storage – example RAID disks. Asymmetric clustering: multiple servers runs the application while one server stands by - monitor. Symmetric clustering: all N hosts are running the application mutual monitoring - no single monitor Reference: Stallings, 4th ed., section 13.4 Operating System Concepts

Real-Time Systems Often used as a control device in a dedicated application such as controlling scientific experiments, medical imaging systems, industrial control systems, and some display systems. Well-defined fixed-time constraints. Real-Time systems may be either hard or soft real-time. Operating System Concepts

Real-Time Systems (Cont.)
Hard real-time: Secondary storage limited or absent, data stored in short term memory, or read-only memory (ROM) Conflicts with time-sharing systems, (delays unpredictable), thus not supported by general-purpose operating systems. Uses deadline scheduling of tasks #1 thing you don't want to happen: system shows the “hour glass” icon at 1000 meters over the moon when landing! Soft real-time Limited utility in industrial control of robotics Useful in applications (multimedia, virtual reality) requiring advanced operating-system features. Cannot guarantee deadlines, but can guarantee highest priority for Soft RT tasks over ordinary tasks. Operating System Concepts

Handheld Systems Personal Digital Assistants (PDAs)
Cellular telephones Issues: Limited memory Slow processors Small display screens. Operating System Concepts

Migration of Operating-System Concepts and Features
What does this mean? Operating System Concepts

Computing Environments
Traditional computing Advancing technologies and falling prices rapidly re changing what used to be “traditional computing”. Not only are “enterprise” level functions being pushed down to the PC level, but new functions are being directly implemented in the PC/micro-computers Web-Based Computing Applications reside on Web servers, rather than on end-users' workstations. These workstations, or appliances are connected to secure servers in order to use applications via web browsers. From: Embedded Computing Most prevalent form of computers in existence: In automobiles, VCR’s, microwave ovens, … They do specific tasks, and associated systems are primitive. Operating System Concepts

Chapter 2: Computer-System Structures 1/31/03
Computer System Operation I/O Structure Storage Structure Storage Hierarchy Hardware Protection General System Architecture NOTE: Instructor annotations in BLUE Operating System Concepts

Computer-System Architecture

Computer-System Operation
I/O devices and the CPU can execute concurrently – due to multi-programming & interrupt schemes and use of controllers - not a new idea: IBM’s I/O Channels ’s. Each device controller is in charge of a particular device type. Each device controller has a local buffer I/O -> buffer is slow, but buffer -> CPU is fast.. CPU moves data from/to main memory to/from local buffers – in absence of DMA I/O is from the device to local buffer of controller. Device controller informs CPU that it has finished its operation by causing an interrupt. Device controller & CPU executes concurrently: controller: controller fills buffer, CPU empties buffer. Operating System Concepts

Common Functions of Interrupts
Interrupt transfers control to the interrupt service routine generally, through the interrupt vector, which contains the addresses of all the service routines. Interrupt architecture must save the address of the interrupted instruction. Incoming interrupts are disabled while another interrupt is being processed to prevent a lost interrupt vs “nested interrupts”. A trap is a software-generated interrupt caused either by an error or a user request - ex: system call is software generated (in the source code). – example the int instruction in DOS assembly language. An operating system is interrupt driven. See also: Notes by Guydosh on interrupt schemes and DMA (on Website) Operating System Concepts

Operating System Concepts
Interrupt Handling The operating system preserves the state of the CPU by storing registers and the program counter. Determines which type of interrupt has occurred: Polling – actually done in a “non-interrupt” situation vectored interrupt system – goes directly to ISR vs. a common jump location to analyze the interrupt before processing it. Separate segments of code determine what action should be taken for each type of interrupt Interrupt Vector addresses a particular a particular address in the IVT, which points to an interrupt handling routine (ISR). Uses the IRQ lines in the control bus, and the Programmable Interrupt Controller (PIC) Operating System Concepts

Interrupt Time Line For a Single Process Doing Output
--User process --ISR move data from buffer to user process  Data xfr Operating System Concepts

I/O Structure Synchronous: After I/O starts, control returns to user program only upon I/O completion … no CPU action on user behalf until I/O complete… Two approaches: Interrupts are part of the architecture: User executes a “wait” instruction which blocks the user until an interrupt is issued to indicate that that the I/O is complete The user goes into a wait loop until an interrupt is issued to indicate that that the I/O is complete Both cases are an inefficient use of interrupts No interrupts in the architecture. When the user makes the I/O request, the device driver will poll a “I/O complete” bit in the port until it indicates that I/O is complete, at which time control is returned to the user. This can be done even it interrupts are part of the architecture, if the I/O wait time is anticipated to be very short, and context switching will be avoided. Typically. one I/O request is outstanding at a time, no simultaneous I/O processing … disable interrupts? Operating System Concepts

I/O Structure (continued)
Asynchronous(2 modes: current user proc goes on vs some other process goes on): After I/O starts, control returns to user program without waiting for I/O completion – if user cannot continue w/o results, OS can switch to another user process. This action is generally accomplished using a System call – A good design would allow another user process to run if the requesting process could not run without the results of the I/O request (a task switch). Device-status table contains entry for each I/O device indicating its type, address, and state – queue up processes waiting for the device. Operating system indexes into I/O device table to determine device status and to modify table entry to include interrupt. Operating System Concepts

Two I/O Methods Synchronous Asynchronous Operating System Concepts

Device-Status Table Return to process making a request when the request is completed via an interrupt. Operating System Concepts

Direct Memory Access Structure
Used for high-speed I/O devices able to transmit information at close to memory speeds. Device controller transfers blocks of data from buffer storage directly to main memory without CPU intervention. Only one interrupt is generated per block, rather than the one interrupt per byte==> in a pure interrupt scheme, granularity of data xfr is typically on a byte or word basis – OK if a slow serial port – overhead is small percent, but high speed xfr, bytes are coming too fast and percent overhead is significant – leaving not much time for data xfr. Solution is DMA. DMA controller xfr’s data from device buffer to main memory (via bus) in parallel with CPU operations … interrupt at end of DMA action which is a relatively large block. Interrupts now infrequent - overhead of interrupts now minimal. Problem: “cycle stealing” - when there is bus/memory contention when CPU is executing a memory word during a DMA xfr, DMA wins out and CPU will pause instruction execution memory cycle (cycle was “stolen”). Operating System Concepts

Storage Structure Main memory – only large storage media that the CPU can access directly. Secondary storage – extension of main memory that provides large nonvolatile storage capacity. Magnetic disks – rigid metal or glass platters covered with magnetic recording material Disk surface is logically divided into tracks, which are subdivided into sectors. The disk controller determines the logical interaction between the device and the computer. Von Neumann architecture: I-cycle/E-cycle – uses memory heavily. Modified to be more efficient in RISC architecture – minimal use of memory + pipelining. Operating System Concepts

Main Memory Main memory & registers accessed directly via instructions. Disk storage is indirect access: must first move data to memory before CPU can access it. Memory mapped I/O – write to “special” memory locations using memory words (example – video buffer). Port I/O is similar to memory mapping – registers have address. Memory mapping allows direct access of outside Storage elements (I/O) - done in hardware. Operating System Concepts

Moving-Head Disk Mechanism

Storage Hierarchy Storage systems organized in hierarchy. Speed Cost Volatility Caching – copying information into faster storage system; main memory can be viewed as a last cache for secondary storage. “principle of locality” makes it work – see later. Operating System Concepts

Storage-Device Hierarchy
Fast, expensive, small Slow cheap Large Operating System Concepts

Caching & Virtual memory
Cache: Use of high-speed memory to hold recently-accessed data. Requires a cache management policy. Caching introduces another level in storage hierarchy. This requires data that is simultaneously stored in more than one level to be consistent. Caching implemented in hardware – integrated with the CPU. Virtual Memory: In the storage hierarchy, main memory is a cache for the the disk. This is how virtual memory is implemented using disk paging. For the most part this function is implemented in software (The OS memory management function) … lots more on this later. Operating System Concepts

Migration of Data “A” From Disk to Register

Hardware Protection Summary
Dual-Mode Operation– supervisor vs user mode – privileged operations I/O Protection – all I/O instruction privileged. Keep user from getting control of computer in “user mode” Memory Protection– keep processes from accessing outside of its process space … base reg & limit reg checked in addressing instructions. CPU Protection – control the amount of time a user process is using the CPU. Use a timer … time slicing … prevents infinite loop hangs. Operating System Concepts

Dual-Mode Operation Sharing system resources requires operating system to ensure that an incorrect program cannot cause other programs to execute incorrectly. Provide hardware support to differentiate between at least two modes of operations - a “mode bit”. 1. User mode – execution done on behalf of a user. 2. Monitor mode (also kernel mode or system mode) – execution done on behalf of operating system. Operating System Concepts

Dual-Mode Operation (Cont.)
Mode bit added to computer hardware to indicate the current mode: monitor (0) or user (1). When an interrupt or fault occurs hardware switches to monitor mode. Interrupt/fault monitor user set user mode Privileged instructions can be issued only in monitor mode. Operating System Concepts

I/O Protection All I/O instructions are privileged instructions. Must ensure that a user program could never gain control of the computer in monitor mode (I.e., a user program that, as part of its execution, stores a new address in the interrupt vector … overrides dual mode (I/O) protection - need memory protection to protect IVT - see below). Operating System Concepts

Use of A System Call to Perform I/O

Memory Protection Must provide memory protection at least for the interrupt vector and the interrupt service routines. In order to have memory protection, add two registers that determine the range of legal addresses a program may access: Base register – holds the smallest legal physical memory address. Limit register – contains the size of the range Memory outside the defined range is protected. Operating System Concepts

Use of A Base and Limit Register

Hardware Address Protection

Hardware Protection When executing in monitor mode, the operating system has unrestricted access to both monitor and user’s memory. The load instructions for the base and limit registers are privileged instructions. … an example of hardware protection Operating System Concepts

CPU Protection Prevent user programs from hogging the CPU Timer – interrupts computer after specified period to ensure operating system maintains control. Timer is decremented every clock tick. When timer reaches the value 0, an interrupt occurs. Timer commonly used to implement time sharing. Time also used to compute the current time. Load-timer is a privileged instruction … HW protection. Operating System Concepts

Network Structure Local Area Networks (LAN) Wide Area Networks (WAN) Operating System Concepts

Local Area Network Structure

Wide Area Network Structure

Chapter 3: Operating-System Structures 9/15/03
System Components Operating System Services System Calls System Programs System Structure Virtual Machines System Design and Implementation System Generation NOTE: Instructor annotations in BLUE Operating System Concepts

Common System Components Key functions of an operating system
Process Management Main Memory Management File Management I/O System Management Secondary Storage Management (disk) Networking Protection System Command-Interpreter System Operating System Concepts

Process Management A process is a program in some “state” of execution. A process needs certain resources, including CPU time, memory, files, and I/O devices, to accomplish its task. A process is a unit of work in a system - see discussion on threads later. The operating system is responsible for the following activities in connection with process management. Process creation and deletion. UNIX: processes should have the ability to dynamically (in real time) spawn off or create other processes process suspension (process is in I/O wait queue, or “swapped” out to disk, …) and resumption (move to ready queue or execution) – manage the state of the process. Provision of mechanisms for: process synchronization - concurrent processing is supported thus the need for synchronization of processes or threads. process communication Deadlock handling Operating System Concepts

Main-Memory Management
Memory is a large array of words or bytes, each with its own address. It is a repository of quickly accessible data shared by the CPU and I/O devices. Main memory is a volatile storage device. It loses its contents in the case of system failure. The operating system is responsible for the following activities in connections with memory management: Keep track of which parts of memory are currently being used and by whom. Decide which processes to load when memory space becomes available - long term or medium term scheduler. Mapping addresses in a process to absolute memory addresses - at load time or run time. Allocate and deallocate memory space as needed. Memory partitioning, allocation, paging (VM), address translation, defrag, … Memory protection Operating System Concepts

File Management The OS abstracts the data stored on a physical device to a logical unit: The File A file is a collection of related information defined by its creator. Commonly, files represent programs (both source and object forms) and data - identifiable by name and location. The operating system is responsible for the following activities in connections with file management: File creation and deletion - system calls or commands. Directory creation and deletion - system calls or commands. Support of primitives for manipulating files and directories in an efficient manner - system calls or commands. Mapping files onto secondary storage. File backup on stable (nonvolatile) storage media. EX: File Allocation Table (FAT) for Windows/PC systems Operating System Concepts

I/O System Management Hide the peculiarities of a specific HW device from the user - device drivers The I/O system consists of: A buffer-caching system A general device-driver interface – part of OS Drivers for specific hardware devices – OS must provide for all devices. Provide system call API for I/O - I/O is a privileged operation Operating System Concepts

Secondary-Storage Management
Since main memory (primary storage) is volatile and too small to accommodate all data and programs permanently, the computer system must provide secondary storage (disk) to back up main memory – basis for virtual memory. Most modern computer systems use disks as the principle on-line storage medium, for both programs and data. The operating system is responsible for the following activities in connection with disk management: Free space management Storage allocation Disk scheduling – minimize seeks (arm movement … very slow operation) Disk as the media for mapping virtual memory space Disk caching for performance Disk utilities: defrag, recovery of lost clusters, etc. Accessing secondary storage is very frequent, thus disk performance can be a performance bottleneck for the entire system Minimize head seeks (cylinder transitions) Operating System Concepts

Networking (Distributed Systems)
Low level support for pure connectivity - message passing, FTP, file sharing, … Higher level functional support: clustering, parallel processing, … A distributed system is a collection heterogeneous processors that do not share memory or a clock. Each processor has its own local memory. Connected by a network. The processors in the system are connected through a communication network. Communication takes place using a protocol. A distributed system provides user access to various system resources. Cooperative vs independent processing. Access to a shared resource allows: Computation speed-up Increased data availability Enhanced reliability Operating System Concepts

Protection System Keep processes from interfering with each other
Protection refers to a mechanism for controlling access by programs, processes, or users to both system and user resources. The protection mechanism must: distinguish between authorized and unauthorized usage. specify the controls to be imposed. provide a means of enforcement. Hardware assists in screening addresses for illegal references. Operating System Concepts

Command-Interpreter System
Many commands are given to the operating system by control statements which deal with: process creation and management I/O handling secondary-storage management main-memory management file-system access protection Networking Commands may have counterparts for use in programming Ascii command line vs. graphic interface, Windows explorer, “add-ins” like Norton Utilities Operating System Concepts

Command-Interpreter System (Cont.)
The program that reads and interprets control statements is called variously: command-line interpreter (Control card interpreter in the “old batch days”) shell (in UNIX) Command.com for external commands in DOS Its function is to get and execute the next command statement. Example: DOS window, UNIX command line, Windows “run” window Operating System Concepts

Operating System Services
OS as a service provider via system calls & commands (typically for the programmer). Program execution – system capability to load a program into memory and to run it - address mapping and translation a key issue. I/O operations – since user programs cannot execute I/O operations directly, the operating system must provide some means to perform I/O - system calls and API. File-system manipulation – program capability to read, write, create, and delete files Communications – exchange of information between processes executing either on the same computer or on different systems tied together by a network. Implemented via shared memory or message passing. Error detection – ensure correct computing by detecting errors in the CPU and memory hardware, in I/O devices, or in user programs – ex:parity errors, arithmetic “errors”, out of memory, out of disk space, program not found, … Memory management Operating System Concepts

Additional Operating System Functions
Additional functions exist not for helping the user, but rather for ensuring efficient system operations. Resource allocation – allocating resources to multiple users or multiple jobs running at the same time – avoiding Deadlock. Accounting – keep track of and record which users use how much and what kinds of computer resources for account billing or for accumulating usage statistics. Protection – ensuring that all access to system resources is controlled, ex: firewalls, passwords, file permissions, etc. Operating System Concepts

System Calls System calls provide the interface between a running program and the operating system– like invoking a command from inside a program. Generally available as assembly-language instructions. Languages defined to replace assembly language for systems programming allow system calls to be made directly (e.g., C, C++) - C language: open, close, read, write, … Three general methods are used to pass parameters between a running program and the operating system. Pass parameters in registers. Store the parameters in a table in memory, and the table address is passed as a parameter in a register – if parms complicated. Push (store) the parameters onto the stack by the program, and pop off the stack by operating system – usually in assembly language. Operating System Concepts

Passing of Parameters As A Table

Types of System Calls Process control File management
Device management Information maintenance Communications See Fig. 3.2, page 66 for details Operating System Concepts

MS-DOS Execution At System Start-up Running a Program
Command.com gets overlaid by the process invoked on the command and is reloaded when this process is exited - a kick-back to days when memory was expensive Operating System Concepts

UNIX Running Multiple Programs

Communication Models Communication may take place using either message passing or shared memory. Msg Passing Shared Memory OS waives memory protection, communication is direct - with need for synch. - like a bulletin board. Setup, send, rcv model, msg packets pass thru OS and unrecoverable to sender Operating System Concepts

System Programs System Programs using system calls to provide convenient access to system services … system calls vs. application programs System programs provide a convenient environment for program development and execution. The can be divided into: File manipulation Status information File modification Programming language support Program loading and execution Communications– ex: msg passing Application programs– could use system calls, example concurrent programs using shared memory and semaphores. Most users’ view of the operation system is defined by system programs, not the actual system calls. Operating System Concepts

Rationale for OS System Structure Concern (from the “School of Hard Knocks” at IBM)
The design process itself requires a highly structured, modularized (even object oriented) approach Need the ability to allow hundreds of people to work on the system at one time in an orderly way System must be testable and verifiable Need the ability to maintain the system: Fix bugs Add new features Tune/customize the system Need to get the system out on time with “no bugs” … remember Windows 95! System must be bullet-proof Layered/modularized approach would support the above requirements. Operating System Concepts

MS-DOS System Structure
MS-DOS – written to provide the most functionality in the least space not divided into modules Although MS-DOS has some structure, its interfaces and levels of functionality are not well separated Operating System Concepts

MS-DOS Layer Structure

UNIX System Structure UNIX – limited by hardware functionality, the original UNIX operating system had limited structuring. The UNIX OS consists of two separable parts. Systems programs The kernel Consists of everything below the system-call interface and above the physical hardware Provides the file system, CPU scheduling, memory management, and other operating-system functions; a large number of functions for one level. Operating System Concepts

UNIX System Structure Layered structure apps System programs Kernel 

Layered Approach The operating system is divided into a number of layers (levels), each built on top of lower layers. The bottom layer (layer 0), is the hardware; the highest (layer N) is the user interface. With modularity, layers are selected such that each uses functions (operations) and services of only lower-level layers. Aids in maintainability, and ease and speed of development. Allows reusable code. Operating System Concepts

An Operating System Layer
Encapsulates data , uses “access methods” API for layer M API for layer M-1 Operating System Concepts

Microkernel System Structure
Strip down the kernel: minimal process, memory, management and communication facilities. Moves as much from the kernel into “user” space. Communication between services running in user space takes place between user modules using message passing. Messages go thru microkernel. All new services to OS are added to user space and do not require modification to kernel Benefits: - easier to extend a microkernel - easier to port the operating system to new architectures - more reliable (less code is running in kernel mode) - more secure Ex. Mach: Maps UNIX system calls into appropriate user level services. Operating System Concepts

Virtual Machines A virtual machine takes the layered approach to its logical conclusion. It treats hardware and the operating system kernel as though they were all hardware. A virtual machine provides an interface identical to the underlying bare hardware. The operating system creates the illusion of multiple processes, each executing on its own processor with its own (virtual) memory. Operating System Concepts

Virtual Machines (Cont.)
The resources of the physical computer are shared to create the virtual machines. CPU scheduling can create the appearance that users have their own processor - you get an “operators” console. Spooling and a file system can provide virtual card readers and virtual line printers. A normal user time-sharing terminal serves as the virtual machine operator’s console. Operating System Concepts

System Models Example of Virtual Machine: CMS Non-virtual Machine
Shared resources: Mem, devices, disk, cpu, …. Operating System Concepts

Advantages/Disadvantages of Virtual Machines
The virtual-machine concept provides complete protection of system resources since each virtual machine is isolated from all other virtual machines. This isolation, however, permits no direct sharing of resources – shared only thru emulator. A virtual-machine system is a perfect vehicle for operating-systems research and development. System development is done on the virtual machine, instead of on a physical machine and so does not disrupt normal system operation. The virtual machine concept is difficult to implement due to the effort required to provide an exact duplicate to the underlying machine. Operating System Concepts

Java Virtual Machine Compiled Java programs are platform-neutral bytecodes executed by a Java Virtual Machine (JVM). JVM consists of - class loader - class verifier - runtime interpreter Operating System Concepts

Java Virtual Machine Diagram taken from Silberschatz’s slides based on “Applied Operating Systems Concepts, 1999 (“alternate book”) … replaced original Operating System Concepts

The Java Platform Taken from Silberschatz’s slides based on “Applied Operating Systems Concepts, 1999 (“alternate book”) Operating System Concepts

Java .class File on Cross Platforms
Taken from Silberschatz’s slides based on “Applied Operating Systems Concepts, 1999 (“alternate book”) Java is portable , since Lava programs are isolated from the hardware & software using “byte” code in “Java Virtual Machine (JVM). Operating System Concepts

Java Development Environment
Taken from Silberschatz’s slides based on “Applied Operating Systems Concepts, 1999 (“alternate” book) … replaced original Operating System Concepts

System Design Goals User goals – operating system should be convenient to use, easy to learn, reliable, safe, and fast. System goals – operating system should be easy to design, implement, and maintain, as well as flexible, reliable, error-free, and efficient. Above two goals involve many tradeoffs and compromises. Operating System Concepts

Mechanisms and Policies
Mechanisms determine how to do something, policies decide what will be done. Policies involves architecture - what are the functions to be implemented. Mechanisms - is the implementation of the architecture The separation of policy from mechanism is a very important principle, it allows maximum flexibility if policy decisions are to be changed later. Operating System Concepts

System Implementation
Traditionally written in assembly language, operating systems can now be written in higher-level languages. Code written in a high-level language: can be written faster. is more compact. is easier to understand and debug. IBM AS/400 code was mostly of C language An operating system is far easier to port (move to some other hardware) if it is written in a high-level language. Operating System Concepts

System Generation (SYSGEN)
Operating systems are designed to run on any of a class of machines; the system must be configured for each specific computer site. SYSGEN program obtains information concerning the specific configuration of the hardware system - similar to customizing Windows during installation. Booting – starting a computer by loading the kernel. Bootstrap program – code stored in ROM that is able to locate the kernel, load it into memory, and start its execution. Operating System Concepts

Chapter 4: Processes 9/16/03 Process Concept Process Scheduling
Operations on Processes Cooperating Processes Interprocess Communication Communication in Client-Server Systems NOTE: Instructor annotations in BLUE Operating System Concepts

Process Concept An operating system executes a variety of programs:
Batch system – jobs Time-shared systems – user programs or tasks Textbook uses the terms job and process almost interchangeably. Process – a program in some state of execution; process execution must progress in sequential fashion. A process includes: program counter stack data section Text section (code) Operating System Concepts

Process State As a process executes, it changes state
new: The process is being created. running: Instructions are being executed. waiting: The process is waiting for some event to occur. ready: The process is waiting to be assigned to a processor – not waiting for anything or resource. terminated: The process has finished execution. Operating System Concepts

Diagram of Process State

Queuing representation of Process States
A queue for each event is faster to search ===> From Stallings, “Operating Systems”, 4th ed. fig 3.8b, p. 121 Operating System Concepts

The Suspended State Suspended State: When a process which has been completely removed from memory and now is residing on the disk as a process image which can be reloaded into memory – “swapping”. Non-virtual memory systems: This state is always needed in a system which does not employ virtual memory: uses process swapping. Process is either fully in memory or fully swapped out to disk. Needed if most processes in memory are blocked (say for I/O) and. By swapping out to the suspended state, memory is freed for a new or runnable processes. CPU utilization maximized. Virtual memory systems: process is only partially in memory. May still be a need for swapping and the suspended state. If too many processes are resident in memory (even partially), performance is impacted – thrashing – low CPU utilization Thus swapping out a process to the disk(Suspended state) will improve performance. Operating System Concepts

Suspended States Most general state diagram
From Stallings, “Operating Systems”, 4th ed. fig 3.8b, p. 121 Operating System Concepts

Transitions involving suspended states
Blocked -> Blocked/Suspend: all processes waiting, no “ready” processes, blocked process is swapped out to make room for unblocked processes Blocked/Suspend ->Ready/Suspend: event waited for occurs, but still remains in suspended state … maybe memory space unavailable. Ready/Suspend -> Ready: all processes waiting, swap in a ready/ suspended processes. Ready-> Ready/Suspend: Rare transition, if running process is extremely large … to free up memory, or due to priorities. New-> Ready/Suspend: Maintain a pool of non-blocked processes on disk – overhead of building process done in advance Blocked/Suspend ->Blocked: System anticipates that waited event will occur soon – move blocked process into memory. Running -> Ready/Suspend: Preempt a running process due to a high priority blocked process becoming unblocked Operating System Concepts

Process Control Block (PCB)
Information associated with each process - uniquely identifies and characterizes a particular process. Some information in the PCB: Process state – running, waiting, ready, etc. Program counter CPU register values CPU scheduling information - used by short term scheduler - see later Memory-management information – base, limit registers, page table info, etc. Accounting information I/O status & file information – devices allocated to proc., open files, etc. Operating System Concepts

Process Control Block (PCB)

CPU Switch From Process to Process
(context switch) Process state saved in PCB Operating System Concepts

Process Scheduling Queues
Job queue – set of all processes in the system. Ready queue – set of all processes residing in main memory, ready and waiting to execute. Device queues – set of processes waiting for an I/O device. Process migration between the various queues. Elements entered in these queues are PCB’s – linked list Operating System Concepts

Ready Queue And Various I/O Device Queues

Representation of Process Scheduling

Schedulers Long-term scheduler (or job scheduler) – selects which processes should be brought into the ready queue. Schedule new jobs to become a process (from disk) Short-term scheduler (or CPU scheduler) – selects which process should be executed next and allocates CPU. Medium term scheduler - remove processes, especially suspended, from memory and move to disk. Later bring process back into memory Swapping - controls degree of Multiprogramming, and memory utilization Operating System Concepts

Addition of Medium Term Scheduling

Schedulers (Cont.) Short-term scheduler is invoked very frequently (milliseconds)  (must be fast) – a potential performance bottleneck). Long-term scheduler is invoked very infrequently (seconds, minutes)  (may be slow). The long-term & medium term schedulers controls the degree of multiprogramming – how many programs to have in memory at one time. Has implications on controlling “thrashing” in a Virtual memory system Processes can be described as either: I/O-bound process – spends more time doing I/O than computations, many short CPU bursts. CPU-bound process – spends more time doing computations; few very long CPU bursts. Operating System Concepts

Context Switch When CPU switches to another process, the system must save the state of the old process and load the saved state for the new process. Context-switch time is overhead; the system does no useful work while switching– an area needing optimization. Time dependent on hardware support. Operating System Concepts

Process Creation Parent process create children processes, which, in turn create other processes, forming a tree of processes. Example: UNIX fork/spawning mechanism. Resource sharing Parent and children share all resources. Children share subset of parent’s resources. Parent and child share no resources. Execution Parent and children execute concurrently. Parent waits until children terminate - wait is a system call. Operating System Concepts

Process Creation (Cont.)
Address space Child duplicate of parent. Child has a program loaded into it. UNIX examples fork system call creates new process exec system call used after a fork to replace the process’ memory space with a new program. fork and execve are both system calls Except for 3 “special” processes that are part of the kernel: scheduler, init, and pagedaemon (PIDs 0, 1, 2), all processes are create from a fork call. All other processes have “fork ancestries” which start with the init system process A typical user program run from the command line would be forked from the shell process. If the parent quits before waiting for a child to quit, the “orphan” child is adopted by init and becomes its child. If child quits before the parent “waits” for it to quit, the child then becomes a “zombie” – the zombie is required for the parent to successfully do a wait for the child (and get certain status). Zombie goes away when wait occurs. Since init never ends, it avoids clogging up the system with zombies by automatically calling wait when a child quits. Operating System Concepts

Processes Tree on a UNIX System
swapper is the scheduler … the family tree Operating System Concepts

Process Termination Process executes last statement and asks the operating system to decide it (exit). Output data from child to parent (via wait). Process’ resources are deallocated by operating system. Parent may terminate execution of children processes (abort). Child has exceeded allocated resources. Task assigned to child is no longer required. Parent is exiting. Operating system does not allow child to continue if its parent terminates – in UNIX it child is adopted by init. Cascading termination. In UNIX terminated process may become a “Zombie” before going away - these may haunt the system for a while! (see earlier slide). Operating System Concepts

Cooperating Processes
Independent process cannot affect or be affected by the execution of another process. Cooperating process can affect or be affected by the execution of another process – synchronization required Advantages of process cooperation Information sharing Computation speed-up - only when distributed across multiple processors Modularity - same rationale as used for strucuring a program along function/subroutine or OO class lines. Convenience Operating System Concepts

Producer-Consumer Problem
Example of Cooperating Processes Paradigm for cooperating processes, producer process produces information that is consumed by a consumer process. unbounded-buffer places no practical limit on the size of the buffer. bounded-buffer assumes that there is a fixed buffer size. In both cases the buffer is shared memory – shared by both processes Operating System Concepts

Bounded-Buffer – Shared-Memory Solution
Shared data #define BUFFER_SIZE 10 typedef struct { . . . } item; item buffer[BUFFER_SIZE]; int in = 0; int out = 0; int counter = 0; Operating System Concepts

Bounded-Buffer – Producer Process
item nextProduced; while (1) { while (counter == BUFFER_SIZE) ; /* do nothing */ buffer[in] = nextProduced; in = (in + 1) % BUFFER_SIZE; counter++; } What if Consumer does not complete the incrementing of shared variable counter? It could take 3 instruction to increment and a task switch could occur all three instructions are complete? Operating System Concepts

Bounded-Buffer – Consumer Process
item nextConsumed; while (1) { while (counter == 0) ; /* do nothing */ nextConsumed = buffer[out]; out = (out + 1) % BUFFER_SIZE; counter--; } What if Producer does not complete the incrementing of shared variable counter? It could take 3 instruction to increment and a task switch could occur all three instructions are complete? Operating System Concepts

Interprocess Communication (IPC)
Mechanism for processes to communicate and to synchronize their actions - handshaking, null message scheme (see later). Message system – processes communicate with each other without resorting to shared variables or common (shared) memory/buffer. IPC facility provides two operations:(p. 111): send(message) – message size fixed or variable receive(message) If P and Q wish to communicate, they need to: establish a communication link between them msgqid= msgget(…) in UNIX project exchange messages via send/receive Implementation of communication link physical (e.g., shared memory, hardware bus, network) Use of shared memory simulates message passing - retains interface- covered in chapter 15 logical (e.g., logical properties or interface/API as seen by application) - only logical implementation covered here. Operating System Concepts

Implementation Questions
How are links established? - explicitly vs. implicitly. Can a link be associated with more than two processes? - ex multiple processes communicating via a network. How many links can there be between every pair of communicating processes? Ex: two processes may communicate via two message queues. What is the capacity of a link? (0, finite, infinite) Is the size of a message that the link can accommodate fixed or variable? Is a link unidirectional or bi-directional? - ex: UNIX pipes can be either Blocking send/receive vs nonblocking send/receive - has synchronization implications. Operating System Concepts

Direct Communication Processes must name each other explicitly:
send (P, message) – send a message to process P receive(Q, message) – receive a message from process Q Name the “other process” in the call. P, Q are pids. In send, “message” can be a pointer or the message itself, in receive, “message” usually a pointer to where to put the received message. Properties of communication link Links are established automatically - only need to know the other’s identity. A link is associated with exactly one pair of communicating processes P1 <=L1=> P2 <=L2=>P3 Between each pair there exists exactly one link. The link may be unidirectional, but is usually bi-directional. … also asymmetric, receive from anyone, id of sender passed to receiver: receive(&id, &message Operating System Concepts

Indirect Communication
Messages are directed and received from mailboxes (also referred to as ports). Each mailbox has a unique “message queue” id. Processes can communicate only if they share a mailbox. Properties of communication link Link established only if processes share a common mailbox A link may be associated with many processes. Each pair of processes may share several communication links - P1 and P2 can communicate with each other via multiple message queues. Link may be unidirectional or bi-directional. Operating System Concepts

Indirect Communication - continued
Operations create a new mailbox- msgget in UNIX send and receive messages through mailbox destroy a mailbox- msgctl(…) return to UNIX Primitives are defined as: send(A, message) – send a message to mailbox A receive(A, message) – receive a message from mailbox A In UNIX, mailbox A is the message Queue id returned by the msgget call Operating System Concepts

Indirect Communication - continued
Mailbox sharing P1, P2, and P3 share mailbox A. P1, sends; P2 and P3 receive. Who gets the message? - may be a “race condition” Solutions… this is your problem! Allow a link to be associated with at most two processes. Allow only one process at a time to execute a receive operation. Allow the system to select arbitrarily the receiver. Sender is notified who the receiver was. Don’t do anything - maybe it does not matter. The “hungry dog” approach. in UNIX can send a “message class id” & receiver only receives messages in a certain class Operating System Concepts

Synchronization - sect 4.5.3
Message passing may be either blocking or non-blocking. Blocking is considered synchronous Non-blocking is considered asynchronous send and receive primitives may be either blocking or non-blocking. Operating System Concepts

Buffering Queue of messages attached to the link; implemented in one of three ways. 1. Zero capacity – 0 messages Sender must wait for receiver (rendezvous). 2. Bounded capacity – finite length of n messages Sender must wait if link full. 3. Unbounded capacity – infinite length Sender never waits. See notes by instructor: “Message Passing Concepts and Synchronization” for additional and supplementary material on message passing. Read pp. 1-4 for now. Operating System Concepts

Client-Server Communication
EX: A user (client) sends request for some information from a server which may also require some computation by the server Some approaches: Sockets Remote Procedure Calls Remote Method Invocation (Java) Operating System Concepts

Sockets A socket is defined as an endpoint for communication.
Identified as a concatenation of IP address and port identifies a host and a port on that host. The socket :1625 refers to port 1625 on host Communication consists between a pair of sockets. Operating System Concepts

Socket Communication Client assigned port by the local
host computer (  1024) Server port is “Well Known” (< 1024) Client port is assigned a port the host it is running on Operating System Concepts

Remote Procedure Calls
Remote procedure call (RPC) abstracts procedure calls between processes on networked systems. Message-based communication used to provide this facility. In contrast to general IPC message passing, RPC messages are highly structured and formalized. They are received by an RPC daemon, listening to a port and contains the identifier of the function to be called, and the parms to be passed. Stubs – client-side proxy for the actual procedure on the server. Hides communication details and makes it look like an ordinary local procedural call. The client-side stub locates the server and “marshalls” the parameters (creates a network packet for parms). The server-side stub receives this message, unpacks the marshalled parameters, and performs the procedure on the server - returned values passed back to client using same technique. Operating System Concepts

Execution of RPC “Matchmaker” gets server port to allow client port to bind procedure name to a remote server port. Client receives the server port number for the given preocedure. Operating System Concepts

Remote Method Invocation
Remote Method Invocation (RMI) is a Java mechanism similar to RPCs. RMI allows a Java program on one machine to invoke a method on a remote object. Operating System Concepts

Marshalling Parameters
Stub is proxy for remote object. Creates a parcel consisting of method name & marshaled parms. Skeleton receives parcel, un-marshals it, invokes the desired method, marshals results, and returns parcel to client. Operating System Concepts

Module 5: Threads 9/29/03+ Overview Benefits User and Kernel Threads
Multithreading Models Solaris 2 Threads Java Threads NOTE: Instructor annotations in BLUE Applied Operating System Concepts

Threads - Overview A thread (or lightweight process) is a basic unit of CPU utilization; it consists of: program counter register set stack space A thread shares with its peer threads (in same process) its: code section data section operating-system resources collectively know as a task. Actually the three things above belong to the process Threads in the process share this stuff. A traditional or heavyweight process is equal to a task with one thread Applied Operating System Concepts

Threads (- Overview Cont.)
In a multiple threaded task, while one server thread is blocked and waiting, a second thread in the same task can run. (assuming the blocked thread doesn’t block the process - if so, everything comes to a screeching halt!) Cooperation of multiple threads in same job confers higher throughput and improved performance. Applications that require sharing a common buffer (i.e., producer-consumer) benefit from thread utilization. Threads provide a mechanism that allows sequential processes to make blocking system calls while also achieving parallelism. … a thread within the process makes the call & only it blocks while other threads in the process still run …assuming kernel supported threads Kernel-supported threads – can reside in user or kernel space, but kernel is involved in their control. User-level (supported) threads; supported from a set of library calls at the user level. Resides in user space. Hybrid approach implements both user-level and kernel-supported threads (Solaris 2). Applied Operating System Concepts

Multiple Threads within a Task
Applied Operating System Concepts

Benefits Responsiveness - Opens the possibility of blocking only one thread in a process on a blocking call rather than the entire process. Resource Sharing – Threads share resources of process to which they belong – code sharing allow multiple instantiations of code execution Economy – low overhead in thread context switching & thread management compared to process context switching & management Utilization of MP Architectures – can be applied to a multi-processor. In a uniprocessor, task switching is so fast that it gives illusion of parallelism. Applied Operating System Concepts

Single and Multithreaded Processes

User Level (supported) Threads (ULTs)
Thread Management Done by User-Level Threads Library Kernel unaware of ULTs No kernel intervention needed for thread management. Thus fast to create and manage If thread blocks, the kernel sees it as the process blocking, and may block the entire process (all threads). Examples - POSIX Pthreads - Mach C-threads - Solaris threads … but not totally kernel independent … Applied Operating System Concepts

Kernel Level (supported) Threads (KLTs)
Supported by the Kernel – thread management done by kernel in kernel space. Since kernel managing threads, kernel can schedule another thread if a given thread blocks rather than blocking the entire processes. Examples - Windows 95/98/NT - Solaris - Digital UNIX Applied Operating System Concepts

Multithreading Models
User Threads / Kernel Threads models. Three common types of threading implementation: Many-to-One One-to-One Many-to-Many Applied Operating System Concepts

Many-to-One Many User-Level Threads Mapped to Single Kernel Thread.
Thread management done at user level - efficient Entire user processes blocks if one of its threads block – kernel only sees the process. Bad for multiprocessor (parallel) architectures because only one thread at a time can access kernel. Used on Systems That Do Not Support Kernel Threads. - Single threaded kernel Applied Operating System Concepts

Many-to-one Model Applied Operating System Concepts

One-to-One Each User-Level Thread Maps to a distinct Kernel Thread.
Other threads can run when a particular thread makes a blocking call - better concurrency. Multiple threads can run in parallel in a multiprocessing architecture. Drawback is that creating a user thread requires creating a kernel thread … overhead factor … number of threads restricted Examples - Windows 95/98/NT - OS/2 Applied Operating System Concepts

One-to-one Model Applied Operating System Concepts

Many-to-Many Model Time Multiplexes many ULT’s to a smaller of equal number of KLT’s. Number of kernel threads may be specific to either a particular application or machine … typically a multiprocessor may get more k-threads than a uniprocessor. Programmer may create as many user threads as desired (compare to 1-to-1 in which there my be a shortage of k-threads. True concurrency limited to scheduling (multiplexing) many user threads against a less number of k-threads. But the multiple k-threads can be run in parallel on a multiprocessor machine. Applied Operating System Concepts

Many-to-many Model Applied Operating System Concepts

Pthreads Example of user controlled threads
Refers to the POSIX standard (IEEE c) API - specifies behavior, not implementation. More commonly implemented on UNIX based system such as Solaris Runs from a Pthead user level library with not specified relationship and any associated kernel threads. Some implementations may allow some kernel control - ex: scheduling??? The “main” function in the C application program is the initial thread created by default, all other threads are created using library calls from the application program- each thread has stack Applied Operating System Concepts

Pthreads (continued) Two typical Pthread library calls:
pthread_create creates a new thread allows you to pass the name of a function which has will be the behavior or function that this thread will do. Allows you to pass parameters to the function of the new thread. sets a thread id variable to the unique “tid”. Pthread_join allows a thread to wait for another specified thread to complete before continuing on - a synchronization tedchique There is a very large and rich set of library function calls to use in Pthread programming - Lots of calls for thread synchronization - more later! See example in Figure 5.5, page 140. Applied Operating System Concepts

Threads Support in Solaris 2
Solaris 2 is a version of UNIX with support for threads at both the kernel and user levels, symmetric multiprocessing, and real-time scheduling. Solaris 2 implements the pthread API, in addition to supporting userlevel threadswith a thread library with API’s for thread creation & management ==> Solaris threads LWP – intermediate level between user-level threads and kernel-level threads. Resource needs of thread types: Kernel thread (are in the kernel - does OS stuff or associates with an LWP) : small data structure and a stack; thread switching does not require changing memory access information – relatively fast. LWP: Associated with a both a user process (PCB) and a kernel thread. Has register data, accounting, and memory information; switching between LWPs is relatively slow. - because of kernel intervention. Scheduled by kernel User-level thread: only need stack and program counter; no kernel involvement means fast switching (multiplexing to LPW) . Kernel only sees the LWPs that are dedicated to user-level threads. Applied Operating System Concepts

Solaris 2 Threads Applied Operating System Concepts

Solaris Threads Threads in user space similar to “pure” Pthreads but with required kernel control Process Structure control information completely in kernel space LWP in kernel space but share the user process resources - can interface with user threads - is visible to user threads. LWP can be thought of as a virtual CPU that is available for executing code .. . We have a single process in one memory space with many virtual “CPU’s” running different parts of the process concurrently (Lewis & Berg). Each LWP is separately scheduled by kernel - kernel schedules LWP’s not user threads. The thread library for user threads is in user space - the user level thread library schedules (multiplexes) user threads onto LWP’s, and LWP’s, in turn, are implemented by kernel threads and scheduled by the kernel. Applied Operating System Concepts

Solaris Process Applied Operating System Concepts

Java Threads Java Threads May be Created by: Extending Thread class
Implementing the Runnable interface Applied Operating System Concepts

Extending the Thread Class
class Worker1 extends Thread { public void run() { System.out.println(“I am a Worker Thread”); } Applied Operating System Concepts

Creating the Thread public class First {
public static void main(String args[]) { Worker runner = new Worker1(); runner.start(); System.out.println(“I am the main thread”); } Applied Operating System Concepts

The Runnable Interface
public interface Runnable { public abstract void run(); } Applied Operating System Concepts

Implementing the Runnable Interface
class Worker2 implements Runnable { public void run() { System.out.println(“I am a Worker Thread”); } Applied Operating System Concepts

Creating the Thread public class Second {
public static void main(String args[]) { Runnable runner = new Worker2(); Thread thrd = new Thread(runner); thrd.start(); System.out.println(“I am the main thread”); } Applied Operating System Concepts

Java Thread Management
suspend() – suspends execution of the currently running thread. sleep() – puts the currently running thread to sleep for a specified amount of time. resume() – resumes execution of a suspended thread. stop() – stops execution of a thread. Applied Operating System Concepts

Java Thread States Runnable state has both actively executing
threads, and “ready” threads Applied Operating System Concepts

Producer Consumer Problem
public class Server { public Server() { MessageQueue mailBox = new MessageQueue(); Producer producerThread = new Producer(mailBox); Consumer consumerThread = new Consumer(mailBox); producerThread.start(); consumerThread.start(); } public static void main(String args[]) { Server server = new Server(); Applied Operating System Concepts

Producer Thread class Producer extends Thread {
public Producer(MessageQueue m) { mbox = m; } public void run() { while (true) { // produce an item & enter it into the buffer Date message = new Date(); mbox.send(message); private MessageQueue mbox; Applied Operating System Concepts

Consumer Thread class Consumer extends Thread {
public Consumer(MessageQueue m) { mbox = m; } public void run() { while (true) { Date message = (Date)mbox.receive(); if (message != null) // consume the message private MessageQueue mbox; Applied Operating System Concepts

Module 6: CPU Scheduling 10/19/03
Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling Algorithm Evaluation NOTE: Instructor annotations in BLUE Applied Operating System Concepts

Basic Concepts Objective: Maximum CPU utilization obtained with multiprogramming A process is executed util it must wait - typically for completion if I/O request, or a time quanta expires. CPU–I/O Burst Cycle – Process execution consists of a cycle of CPU execution and I/O wait. CPU burst distribution Applied Operating System Concepts

Alternating Sequence of CPU And I/O Bursts

Histogram of CPU-burst Times
“Tune” the scheduler to these statistics Applied Operating System Concepts

CPU Scheduler Selects from among the processes in memory that are ready to execute, and allocates the CPU to one of them. CPU scheduling decisions may take place when a process: 1. Runs until it Switches from running to waiting state … stop executing only when a needed resource or service is currently unavailable. 2. Switches from running to ready state in the middle of a burst – can stop execution at any time 3. Switches from waiting to ready. … ? 4. Runs until it Terminates. Scheduling under 1 and 4 is nonpreemptive. All other scheduling is preemptive. Under nonpremptive scheduling, once a CPU is assigned to a process, the process keeps the CPU until it releases the CPU either by terminating or switching to wait state - “naturally” stop execution. Applied Operating System Concepts

Dispatcher Dispatcher module gives control of the CPU to the process selected by the short-term scheduler; this involves: switching context switching to user mode jumping to the proper location in the user program to restart that program Dispatch latency – time it takes for the dispatcher to stop one process and start another running. Both scheduler and dispatcher are performance “bottlenecks” in the OS, and must be made as fast and efficient as possible. Applied Operating System Concepts

Scheduling Criteria CPU utilization – keep the CPU as busy as possible
Throughput – # of processes that complete their execution per time unit Turnaround time – amount of time to execute a particular process Or: the time from time of submission to time of completion – includes waits in queues in addition to execute time. Waiting time – amount of time a process has been waiting in the ready queue – sum of the times in ready queue - this is from the point of view scheduler - scheduler does not look at CPU time or I/O wait time (only ready queue time) if it minimizes “waiting time”. .. Get a process through the ready queue as soon as possible. Response time – amount of time it takes from when a request was submitted until the first response is produced, not output (for time-sharing environment) Applied Operating System Concepts

Optimization Criteria
Max CPU utilization Max throughput Min turnaround time Min waiting time Min response time Applied Operating System Concepts

First-Come, First-Served (FCFS) Scheduling
Example: Process Burst Time P1 24 P2 3 P3 3 Suppose that the processes arrive in the order: P1 , P2 , P3 The Gantt Chart for the schedule is: Waiting time for P1 = 0; P2 = 24; P3 = 27 Average waiting time: ( )/3 = 17 P1 P2 P3 24 27 30 Applied Operating System Concepts

FCFS Scheduling (Cont.)
Suppose that the processes arrive in the order P2 , P3 , P1 . The Gantt chart for the schedule is: Waiting time for P1 = 6; P2 = 0; P3 = 3 Average waiting time: ( )/3 = 3 Much better than previous case. Convoy effect short process behind long process P2 P3 P1 3 6 30 Applied Operating System Concepts

Shortest-Job-First (SJF) Scheduling
Associate with each process the length of its next CPU burst. Use these lengths to schedule the process with the shortest time. Two schemes: nonpreemptive – once CPU given to the process it cannot be preempted until completes its CPU burst. Preemptive – if a new process arrives with CPU burst length less than remaining time of current executing process, preempt. This scheme is know as the Shortest-Remaining-Time-First (SRTF). SJF is optimal – gives minimum average waiting time for a given set of processes. Applied Operating System Concepts

Example of Non-Preemptive SJF
Process Arrival Time Burst Time P P P P SJF (non-preemptive) FCFS is “tie breaker” if burst times the same. Average waiting time = ( )/4 - 4 P1 P3 P2 P4 3 7 8 12 16 Applied Operating System Concepts

Example of Preemptive SJF (Also called Shortest-Remaining-Time-First (SRTF) )
In order for a new arrival to preempt, its burst must be strictly less than current remaining time Process Arrival Time Burst Time P P P P SJF (preemptive) Average waiting time = ( )/4 = 3 P1 P2 P3 P2 P4 P1 11 16 2 4 5 7 Applied Operating System Concepts

Determining Length of Next CPU Burst
Can only estimate the length. Can be done by using the length of previous CPU bursts, using exponential averaging. n+1 = tn + (1-  )n n stores past history tn is “recent history  is a weighting factor … recursive in n Applied Operating System Concepts

Examples of Exponential Averaging
 =0 n+1 = n Recent history does not count.  =1 n+1 = tn Only the actual last CPU burst counts. If we expand the formula, we get: n+1 =  tn+(1 - )  tn -1 + … +(1 -  )j  tn -1 + … +(1 -  )n=1 tn 0 Since both  and (1 - ) are less than or equal to 1, each successive term has less weight than its predecessor. Applied Operating System Concepts

Priority Scheduling A priority number (integer) is associated with each process The CPU is allocated to the process with the highest priority (smallest integer  highest priority). Preemptive nonpreemptive SJF is a priority scheduling where priority is the predicted next CPU burst time. Problem: Starvation – low priority processes may never execute. Solution: Aging – as time progresses increase the priority of the process - a “reward” for waiting in line a long time - let the old geezers move ahead! Applied Operating System Concepts

Round Robin (RR) Each process gets a small unit of CPU time (time quantum), usually milliseconds. After this time has elapsed, the process is preempted and added to the end of the ready queue RR is a preemptive algorithm. If there are n processes in the ready queue and the time quantum is q, then each process gets 1/n of the CPU time in chunks of at most q time units at once. No process waits more than (n-1)q time units. Performance q large  FIFO q small  q must be large with respect to context switch, otherwise overhead is too high. ==> “processor sharing” - user thinks it has its own processor running at 1/n speed (n processors) Applied Operating System Concepts

Example: RR with Time Quantum = 20
FCFS is tie breaker Process Burst Time P1 53 P2 17 P3 68 P4 24 The Gantt chart is: Typically, higher average turnaround than SJF, but better response. Assume all arrive at 0 time in the order given. P1 P2 P3 P4 20 37 57 77 97 117 121 134 154 162 Applied Operating System Concepts

How a Smaller Time Quantum Increases Context Switches
Context switch overhead very critical for 3rd case - since overhead is independent of quanta time Applied Operating System Concepts

Turnaround Time Varies With The Time Quantum
No strict correlation of TAT and time quanta size - except for below TAT can be improved if most processes finish each burst in one quanta EX: if 3 processes each have burst of 10, then for q = 1, avg_TAT = 29, but for q = burst = 10, avg_TAT = 20. ==> design tip: tune quanta to average burst. Applied Operating System Concepts

Multilevel Queue (no Feedback - see later)
Ready queue is partitioned into separate queues: foreground queue (interactive) background queue (batch) Each queue has its own scheduling algorithm, foreground queue – RR background queue – FCFS Scheduling must be done between the queues. Fixed priority scheduling; i.e., serve all from foreground then from background. All higher priority queues must be empty before given queue is processed. Possibility of starvation. Assigned queue is for life of the process. Time slice between queues – each queue gets a certain amount of CPU time which it can schedule amongst its processes; i.e., 80% to foreground in RR 20% to background in FCFS Applied Operating System Concepts

Multilevel Queue Scheduling

Multilevel Feedback Queue
A process can move between the various queues; aging can be implemented this way. Multilevel-feedback-queue scheduler defined by the following parameters: number of queues scheduling algorithms for each queue method used to determine when to upgrade a process method used to determine when to demote a process method used to determine which queue a process will enter when that process needs service Applied Operating System Concepts

Multilevel Feedback Queues
Queue 0 - High priority Queue 1 Queue 2 Low priority Higher priority queues pre-empt lower priority queues on new arrivals Fig 6.7 Applied Operating System Concepts

Example of Multilevel Feedback Queue
Three queues: Q0 – time quantum 8 milliseconds Q1 – time quantum 16 milliseconds Q2 – FCFS Q2 longest jobs, with lowest priority, Q1 shortest jobs with highest priority. Scheduling A new job enters queue Q0 which is served FCFS. When it gains CPU, job receives 8 milliseconds. If it does not finish in 8 milliseconds, job is moved to queue Q1(demoted). At Q1 job is again served FCFS and receives 16 additional milliseconds. If it still does not complete, it is preempted and moved to queue Q2. Again, Qn is not served until Qn-1 empty Applied Operating System Concepts

Multiple-Processor Scheduling
CPU scheduling more complex when multiple CPUs are available. Homogeneous processors within a multiprocessor. Load sharing Symmetric Multiprocessing (SMP) – each processor makes its own scheduling decisions. Asymmetric multiprocessing – only one processor accesses the system data structures, alleviating the need for data sharing. Applied Operating System Concepts

Real-Time Scheduling Hard real-time systems – required to complete a critical task within a guaranteed amount of time. Soft real-time computing – requires that critical processes receive priority over less fortunate ones. Deadline scheduling used. Applied Operating System Concepts

Dispatch Latency Applied Operating System Concepts

Solaris 2 Thread Scheduling
3 scheduling priority classes Timesharing/interactive – lowest – for users Within this class: Multilevel feedback queues longer time slices in lower priority queues System – for kernel processes Real time – Highest Local Scheduling – How the threads library decides which thread to put onto an available LWP. Remember threads are time multiplexed on LWP’s Global Scheduling – How the kernel decides which kernel thread to run next. The local schedules for each class are “globalized” from the scheduler's point of view – all classes included. LPW’s scheduled by kernel Applied Operating System Concepts

Solaris 2 Scheduling Only a few in this class Real time Fig. 6.10
Reserved for kernel use. Ex: Scheduler & paging daemon User processes Go here Applied Operating System Concepts

Java Thread Scheduling
JVM Uses a Preemptive, Priority-Based Scheduling Algorithm. FIFO Queue is Used if There Are Multiple Threads With the Same Priority. Applied Operating System Concepts

Java Thread Scheduling (cont)-omit
JVM Schedules a Thread to Run When: The Currently Running Thread Exits the Runnable State. A Higher Priority Thread Enters the Runnable State * Note – the JVM Does Not Specify Whether Threads are Time-Sliced or Not. Applied Operating System Concepts

Time-Slicing (Java) - omit
Since the JVM Doesn’t Ensure Time-Slicing, the yield() Method May Be Used: while (true) { // perform CPU-intensive task . . . Thread.yield(); } This Yields Control to Another Thread of Equal Priority. Cooperative multi-tasking possible using yield Applied Operating System Concepts

Java Thread Priorities - omit
Thread Priorities: Priority Comment Thread.MIN_PRIORITY Minimum Thread Priority Thread.MAX_PRIORITY Maximum Thread Priority Thread.NORM_PRIORITY Default Thread Priority Priorities May Be Set Using setPriority() method: setPriority(Thread.NORM_PRIORITY + 2); Applied Operating System Concepts

Algorithm Evaluation Deterministic modeling – takes a particular predetermined workload and defines the performance of each algorithm for that workload - what we’ve been doing in the examples - optimize various criteria. Easy, but success depends on accuracy of input Queuing models - statistical - need field measurements of statistics in various compouting environments Implementation - Costly - OK is a lot of pre-implementation done first. Applied Operating System Concepts

Evaluation of CPU Schedulers by Simulation
-need good models - drive it with field data and/or statistical data - could be slow. Applied Operating System Concepts

Chapter 7: Process Synchronization 10/14/03
Background The Critical-Section Problem Synchronization Hardware Semaphores Classical Problems of Synchronization Critical Regions Monitors Synchronization in Solaris 2 & Windows NOTE: Instructor annotations in BLUE

Background Concurrent access to shared data may result in data inconsistency. Maintaining data consistency requires mechanisms to ensure the orderly execution of cooperating processes. Shared-memory solution to bounded-butter problem (Chapter 4) allows at most n – 1 items in buffer at the same time. A solution, where all N buffers are used is not simple. Suppose that we modify the producer-consumer code by adding a variable counter, initialized to 0 and incremented each time a new item is added to the buffer

Bounded-Buffer Shared data #define BUFFER_SIZE 10 typedef struct {
. . . } item; item buffer[BUFFER_SIZE]; int in = 0; int out = 0; int counter = 0;

Bounded-Buffer Producer process item nextProduced; while (1) {
while (counter == BUFFER_SIZE) ; /* do nothing */ buffer[in] = nextProduced; in = (in + 1) % BUFFER_SIZE; counter++; }

Bounded-Buffer Consumer process item nextConsumed; while (1) {
while (counter == 0) ; /* do nothing */ nextConsumed = buffer[out]; out = (out + 1) % BUFFER_SIZE; counter--; }

Bounded Buffer The statements counter++; counter--; must be performed atomically. Atomic operation means an operation that completes in its entirety without interruption.

Bounded Buffer Analysis
Although “P” and “C” routines are logically correct in themselves, they may not work correctly running concurrently. The counter variable is shared between the two processes When ++counter and --count are done concurrently, the results are unpredictable because the operations are not atomic Assume ++counter, --counter compiles as follows: ++counter counter register1 = counter register2 = counter register1 = register register2 = register counter = register counter = register2

If both the producer and consumer attempt to update the buffer concurrently, the assembly language statements may get interleaved. Interleaving depends upon how the producer and consumer processes are scheduled . P and C get unpredictable access to CPU due to time slicing

Assume counter is initially 5. One interleaving of statements is: producer: register1 = counter (register1 = 5) producer: register1 = register1 + 1 (register1 = 6) consumer: register2 = counter (register2 = 5) consumer: register2 = register2 – 1 (register2 = 4) producer: counter = register1 (counter = 6) consumer: counter = register2 (counter = 4) The value of count may be either 4 or 6, where the correct result should be 5. We have a “race condition” here.

Race Condition Race condition: The situation where several processes access – and manipulate shared data concurrently. The final value of the shared data depends upon which process finishes last. To prevent race conditions, concurrent processes must be synchronized. Is there any logical difference in a race condition if the two “racing” processes are on two distinct CPU’s in an SMP environment vs. both running concurrently on a single uniprocessor? … is one worse than the other?

The Critical-Section Problem
n processes all competing to use some shared data Each process has a code segment, called critical section, in which the shared data is accessed. Problem – ensure that when one process is executing in its critical section, no other process is allowed to execute in its critical section.

Solution to Critical-Section Problem
1. Mutual Exclusion. If process Pi is executing in its critical section, then no other processes can be executing in their critical sections – unless it is by deliberate design by the programmer – see readers/writer problem later. 2. *Progress. If there exist some processes that wish to enter their critical section, then processes outside of their critical sections cannot participate in the decision of which one will enter its CS next, and this selection cannot be postponed indefinitely. In other words: No process running outside its CS may block other processes from entering their CS’s – ie., a process must be in a CS to have this privilege. 3. Bounded Waiting. A bound must exist on the number of times that other processes are allowed to enter their critical sections after a process has made a request to enter its critical section and before that request is granted - no starvation. Assume that each process executes at a nonzero speed No assumption concerning relative speed of the n processes. * Item 2 is A modification of the rendition given in Silberschatz which appears to be “muddled” – see “Modern Operating Systems”, 1992, by Tanenbaum, p. 35. See next slide ffrom other author’s rendition of these conditions for clarification:

Solution to Critical-Section Problem
From: “Operating systems”, by William Stallings, 4th ed., page 207 Mutual exclusion must be enforced: Only one process at a time is allowed into the critical section (CS) A process that halts in its noncritical section must do so without interfering with other processes It must not be possible for a process requiring access to a CS to be delayed indefinitely: no deadlock or starvation. When no process is in a critical section, any process that requests entry to its CS must be permitted to enter without delay. No assumptions are made about relative process speeds or number of processors. A process remains inside its CS for a finite time only. Also From “Modern Operating Systems”, 1992, by Tanenbaum, p No two processes may be simultaneously inside their critical sections. - No assumptions may be made about speeds or the numbers of CPUs - No process running outside its CS may block other processes from entering their CS’s - No process should have to wait forever to enter its CS

Initial Attempts to Solve Problem
Only 2 processes, P0 and P1 General structure of process Pi (other process Pj) do { entry section critical section exit section reminder section } while (1); Processes may share some common variables to synchronize their actions.

Algorithm 1 Assume two processes: P0 and P1 Shared variables:
int turn; initially turn = 0 /* let P0 have turn first */ if (turn == i)  Pi can enter its critical section, i = 0, 1 For process Pi : // where the “other” process is Pj do { while (turn != i) ; // wait while it is your turn critical section turn = j; // allow the “other” to enter after exiting CS reminder section } while (1); Satisfies mutual exclusion, but not progress if Pj decides not to re-enter or crashes outside CS, then Pi cannot ever get in A process cannot make consecutive entries to CS even if “other” is not in CS - other prevents a proc from entering while not in CS - a “no-no”

Algorithm 2 Shared variables
boolean flag[2]; // “interest” bits initially flag [0] = flag [1] = false. flag [i] = true  Pi declares interest in entering its critical section Process Pi // where the “other” process is Pj do { flag[i] = true; // declare your own interest while (flag[ j]) ; //wait if the other guy is interested critical section flag [i] = false; // declare that you lost interest remainder section // allows other guy to enter } while (1); Satisfies mutual exclusion, but not progress requirement. If flag[ I] == flag[ j] == true, then deadlock - no progress but barring this event, a non-CS guy cannot block you from entering Can make consecutive re-entries to CS if other not interested Uses “mutual courtesy”

Algorithm 3 Peterson’s solution
Combined shared variables & approaches of algorithms 1 and 2. Process Pi // where the “other” process is Pj do { flag [i] = true; // declare your interest to enter turn = j; // assume it is the other’s turn-give PJ a chance while (flag [ j ] and turn == j) ; critical section flag [i] = false; remainder section } while (1); Meets all three requirements; solves the critical-section problem for two processes (the best of all worlds - almost!). Turn variable breaks any deadlock possibility of previous example, AND prevents “hogging” – Pi setting turn to j gives PJ a chance after each pass of Pi’s CS Flag[ ] variable prevents getting locked out if other guy never re-enters or crashes outside and allows CS consecutive access other not intersted in entering. Down side is that waits are spin loops. Question: what if the setting flag[i], or turn is not atomic?

Bakery Algorithm … or the Supermarket deli algorithm
Critical section for n processes Before entering its critical section, process receives a number. Holder of the smallest number enters the critical section. If processes Pi and Pj receive the same number, if i < j, then Pi is served first; else Pj is served first. The numbering scheme always generates numbers in increasing order of enumeration; i.e., 1,2,3,3,3,3,4,5...

Bakery Algorithm Notation < lexicographical order (ticket #, process id #) (a,b) < c,d) if a < c or if a = c and b < d max (a0,…, an-1) is a number, k, such that k  ai for i = 0, … , n – 1 Shared data boolean choosing[n]; int number[n]; Data structures are initialized to false and 0 respectively

Bakery Algorithm For Process Pi
do { /* Has four states: choosing, scanning, CS, remainder section */ choosing[i] = true; //process does not “compete” while choosing a # number[i] = max(number[0], number[1], …, number [n – 1]) + 1; choosing[i] = false; /*scan all processes to see if if Pi has lowest number: */ for (k = 0; k < n; k++) { /* check process k, book uses dummy variable “j” */ while (choosing[ k ]); /*if Pk is choosing a number wait till done */ while ((number[k] != 0) && (number[k], k < number[ i ], i)) ; /* if (Pk is waiting (or in CS) and Pk is “ahead” of Pi then Pi waits */ /*if Pk is not in CS and is not waiting, OR Pk is waiting with a larger number, then skip over Pk - when Pi gets to end of scan, it will have lowest number and will fall thru to the CS */ /* If Pi is waiting on Pk, then number[k] will go to 0 because Pk will eventually get served – thus causing Pi to break out of the while loop and check out the status of the next process if any */ /* the while loop skips over the case k == i */ } critical section number[i] = 0; /* no longer a candidate for CS entry */ remainder section } while (1);

Synchronization Hardware Test and Set
targer is the lock: target ==1 means cs locked, else open Test and modify the content of a word atomically. boolean TestAndSet(boolean target) { boolean rv = target; target = true; return rv; // return the original value of target } See “alternate definition” in “Principles of Concurrency” notes if target ==1 (locked), target stays 1 and return 1, “wait” if target ==0 (unlocked), set target to 1(lock door), return 0, and enter “enter CS”

Mutual Exclusion with Test-and-Set
Shared data: boolean lock = false; // if lock == 0, door open, if lock == 1, door locked Process Pi do { while (TestAndSet(lock)) ; critical section lock = false; remainder section } See Flow chart (“Alternate Definition” ) in Instructors notes: “Concurrency and Mutual exclusion”, Hardware Support for details on how this works.

Synchronization Hardware swap
Atomically swap two variables. void Swap(boolean &a, boolean &b) { boolean temp = a; a = b; b = temp; }

Mutual Exclusion with Swap for process Pi
Shared data (initialized to false): boolean lock = false; /* shared variable - global */ // if lock == 0, door open, if lock == 1, door locked Private data boolean key_i = true; Process Pi do { key_i = true; /* not needed if swap used after CS exit */ while (key_i == true) Swap(lock, key_i ); critical section /* remember key_i is now false */ lock = false; /* can also use: swap(lock, key_i ); */ remainder section } See Flow chart (“Alternate Definition” ) in Instructors notes: “Concurrency and Mutual exclusion”, Hardware Support for details on how this works. NOTE: The above applications for mutual exclusion using both test and set and swap do not satisfy bounded waiting requirement - see book pp for resolution..

Semaphores Synchronization tool that does not require busy waiting.
Semaphore S – integer variable can only be accessed via two indivisible (atomic) operations wait (S): while S 0 do no-op; S--; signal (S): S++; Note: Test before decrement Normally never negative No queue is used (see later) What if more than 1 process is waiting on S and a signal occurs? Employs a spin loop (see later)

Critical Section of n Processes
Shared data: semaphore mutex; //initially mutex = 1 Process Pi: do { wait(mutex); critical section signal(mutex); remainder section } while (1);

Semaphore Implementation
Get rid of the spin loop, add a queue, and increase functionality (counting semaphore) Used in UNIX Define a semaphore as a record typedef struct { int value; struct process *L; } semaphore; Assume two simple operations: Block(P) suspends (blocks) the process P. wakeup(P) resumes the execution of a blocked process P. Must be atomic

Implementation- as in UNIX
Semaphore operations called by process P now defined as wait(S): S.value--; //Note: decrement before test if (S.value < 0) { add this process to S.L; block(P); //block the process calling wait(S) } // “door” is locked if S.value <= 0; signal(S): S.value++; if (S.value <= 0) { // necessary condition for non-empty queue remove a process P from S.L; wakeup(P); //move P to ready queue } /* use some algorithm for choosing a process to remove from queue, ex: FIFO */

Implementation- continued
Compare to original “classical” definition: Allows for negative values (“counting” semaphore) Absolute value of negative value is number of processes waiting in the semaphore Positive value is the number of processes that can call wait and not block Wait() blocks on negative Wait() decrements before testing Remember: signal moves a process to the ready queue, and not necessarily to immediate execution How is something this complex made atomic? Most common: disable interrupts Use some SW techniques such as Peterson’s solution inside wait and signal – spin-loops would be very short (wait/signal calls are brief) – see p. 204 … different than the spin-loops in classical definition!

Implementation- continued
The counting semaphore has two fundamental applications: Mutual exclusion for solving the “CS” problem Controlling access to resources with limited availability, for example in the bounded buffer problem: block consumer if the buffer empty block the producer if the buffer is full. Both applications are used for the bounded buffer problem

Semaphore as a General Synchronization Tool
Execute B in Pj only after A executed in Pi Serialize these events (guarantee determinism): Use semaphore flag initialized to 0 Code: Pi Pj   A wait(flag) signal(flag) B

Deadlock and Starvation
Just because semaphores can solve the “CS” problem, they still can be misused and cause deadlock or starvation Deadlock – two or more processes are waiting indefinitely for an event that can be caused by only one of the waiting processes. Let S and Q be two semaphores initialized to 1 P0 P1 (1) wait(S); (2) wait(Q); (3) wait(Q); (4) wait(S);   signal(S); signal(Q); signal(Q) signal(S); Random sequence 1, 2, 3, 4 causes deadlock But sequence 1, 3, 2, 4 will not deadlock Starvation – indefinite blocking. A process may never be removed from the semaphore queue in which it is suspended. Can be avoided by using FIFO queue discipline.

Two Types of Semaphores
Counting semaphore – integer value can range over an unrestricted domain. Binary semaphore – integer value can range only between 0 and 1; can be simpler to implement. Can implement a counting semaphore S as a binary semaphore.

Binary Semaphore Definition
Struct b-semaphore { int value; /* boolean, only 0 or 1 allowed */ struct process queue; } Void wait-b (b-semaphore s) { /* see alternate def. Below */ if (s.value == 1) s.value = 0; // Lock the “door” & let process enter CS else { place this process in s.queue and block it;} // no indication of size of queue ... compare to general semaphore } // wait unconditionally leaves b-semaphore value at 0 Void signal-b(b-semaphore s) { //s.value==0 is necessary but not sufficient if (s.queue is empty) s.value = 1; // condition for empty queue else move a process from s.queue to ready list; } // if only 1 proc in queue, leave value at 0, “moved” proc will go to CS ********************************************************************* Alternate definition of wait-b (simpler): Void wait-b (b-semaphore s) { if (s.value == 0) {place this process in s.queue and block it;} s.value =0; // value was 1, set to 0 } // compare to test & set

Implementing Counting Semaphore S using Binary Semaphores
Data structures: binary-semaphore S1, S2; int C: Initialization: S1 = 1 S2 = 0 C = initial value of semaphore S

Implementing Counting Semaphore S using Binary Semaphores (continued)
wait operation wait(S1); C--; if (C < 0) { signal(S1); wait(S2); } else? signal(S1); signal operation wait(S1); //should this be a “different” S1? C ++; if (C <= 0) signal(S2); else // else should be removed? See a typical example of semaphore operation given in “Chapter 7 notes by Guydosh” posted along with these notes.

Classical Problems of Synchronization
Bounded-Buffer Problem Readers and Writers Problem Dining-Philosophers Problem

Bounded-Buffer Problem
Uses both aspects of a counting semaphore: Mutual exclusion: the mutex semaphore Limited resource control: full and empty semaphores Shared data: semaphore full, empty, mutex; where: - full represents the number of “full” (used) slots in the buffer - empty represents the number if “empty” (unused) slots in the buffer - mutex is used for mutual exclusion in the CS Initially: full = 0, empty = n, mutex = 1 /* buffer empty, “door” is open */

Bounded-Buffer Problem Producer Process
do { … produce an item in nextp wait(empty); wait(mutex); add nextp to buffer signal(mutex); signal(full); } while (1); Question: what may happen if the mutex wait/signal calls were done before and after the corresponding calls for empty and full respectively?

Bounded-Buffer Problem Consumer Process
do { wait(full) wait(mutex); … remove an item from buffer to nextc signal(mutex); signal(empty); consume the item in nextc } while (1); Same Question: what may happen if the mutex wait/signal calls were done before and after the corresponding calls for empty and full respectively? See the trace of a bounded buffer scenario given in “Chapter 7 notes by Guydosh” posted along with these notes.

Readers-Writers Problem
Semaphores used purely for mutual exclusion No limits on writing or reading the buffer Uses a single writer, and multiple readers Multiple readers allowed in CS Readers have priority over writers for these slides. As long as there is a reader(s) in the CS, no writer may enter – CS must be empty for a writer to get in An alternate (more fair approach) is: even if readers are in the CS, no more readers will be allowed in if a writer “declares an interest” to get into the CS – this is the writer preferred approach. It is discussed in more detail in Guydosh’s notes: “Readers-Writers problems Scenario” on the website. Shared data semaphore mutex, wrt; Initially mutex = 1, wrt = 1, readcount = 0 /* all “doors” open, no readers in */

Readers-Writers Problem Writer Process
wait(wrt); … writing is performed signal(wrt);

Readers-Writers Problem Reader Process
wait(mutex); // 2nd, 3rd, … readers queues on mutex if 1st reader // waiting for a writer to leave (blocked on wrt). // also guards the modification of readcount readcount++; if (readcount == 1) wait(wrt); //1st reader blocks if any writers in CS signal(mutex); //1st reader is in- open the “floodgates” for its companions … reading is performed wait(mutex); // guards the modification of readcount readcount--; if (readcount == 0) signal(wrt); // last reader out lets any queued writers in signal(mutex): See Guydosh’s notes: “Readers-Writers problems Scenario” (website) for a detailed trace of this algorithm.

Dining-Philosophers Problem
Chopstick i+1 Process i Chopstick i Shared data semaphore chopstick[5]; Initially all values are 1

Dining-Philosophers Problem
Philosopher i: do { wait(chopstick[i]) // left chopstick wait(chopstick[(i+1) % 5]) // right chopstick … eat signal(chopstick[i]); signal(chopstick[(i+1) % 5]); think } while (1); This scheme guarantees mutual exclusion but has deadlock – what if all 5 philosophers get hungry simultaneously and all grab for the left chopstick at the same time? .. See Monitor implementation for deadlock free solution … sect 7.7

Critical Regions - omit
High-level synchronization construct A shared variable v of type T, is declared as: v: shared T Variable v accessed only inside statement region v when B do S where B is a boolean expression. While statement S is being executed, no other process can access variable v.

Critical Regions - omit
Regions referring to the same shared variable exclude each other in time. When a process tries to execute the region statement, the Boolean expression B is evaluated. If B is true, statement S is executed. If it is false, the process is delayed until B becomes true and no other process is in the region associated with v.

Critical Regions Example – Bounded Buffer - omit
Shared data: struct buffer { int pool[n]; int count, in, out; }

Critical Regions Bounded Buffer Producer Process - omit
Producer process inserts nextp into the shared buffer region buffer when( count < n) { pool[in] = nextp; in:= (in+1) % n; count++; }

Critical Regions Bounded Buffer Consumer Process - omit
Consumer process removes an item from the shared buffer and puts it in nextc region buffer when (count > 0) { nextc = pool[out]; out = (out+1) % n; count--; }

Critical Regions: Implementation region x when B do S - omit
Associate with the shared variable x, the following variables: semaphore mutex, first-delay, second-delay; int first-count, second-count; Mutually exclusive access to the critical section is provided by mutex. If a process cannot enter the critical section because the Boolean expression B is false, it initially waits on the first-delay semaphore; moved to the second-delay semaphore before it is allowed to reevaluate B.

Critical Regions: Implementation – omit
Keep track of the number of processes waiting on first-delay and second-delay, with first-count and second-count respectively. The algorithm assumes a FIFO ordering in the queuing of processes for a semaphore. For an arbitrary queuing discipline, a more complicated implementation is required.

Monitors High-level synchronization construct that allows the safe sharing of an abstract data type among concurrent processes. Closely related to P- thread synchronization – see later monitor monitor-name { // procedures internal to the monitor shared variable declarations procedure body P1 (…) { . . . } procedure body P2 (…) { procedure body Pn (…) { { initialization code

Monitors Rationale for the properties of a monitor:
Mutual exclusion: A monitor implements mutual exclusion by allowing only one process at a time to execute in one of the internal procedures. Calling a monitor procedure will automatically cause the caller to block in the entry queue if the monitor is occupied, else it will let it in. On exiting the monitor, another waiting process will be allowed in. When inside the monitor, management of a limited resource which may not always be available for use: A process can unconditionally block on a “condition variable” (CV) by calling a wait on this CV. This also allows another process to enter the Monitor while the process calling the wait is blocked. When a process using a resource protected by a CV is done with it, it will signal on the associated CV to allow another process waiting in this CV to use the resource. The signaling process must “get out of the way” and yield to the released process receiving the signal by blocking. This preserves mutual exclusion in the monitor - Hoare’s scheme.

Monitors Summary To allow a process to wait within the monitor, a condition variable must be declared, as condition x, y; // condition variables Condition variable can only be used with the operations wait and signal. The operation x.wait(); means that the process invoking this operation is unconditionally suspended until another process invokes x.signal(); The x.signal operation resumes exactly one suspended process. If no process is suspended, then the signal operation has no effect.

Schematic View of a Monitor: entry queue

Monitor With Condition Variables
Entry queue is for Mutual exclusion Condition variable queues may be used for resource management and are the processes waiting on a condition variable.

Dining Philosophers Example
monitor dp { enum {thinking, hungry, eating} state[5]; condition self[5]; void pickup(int i) // following slides void putdown(int i) // following slides void test(int i) // following slides void init() { for (int i = 0; i < 5; i++) state[i] = thinking; } Philosopher i code (written by user): loop at random times { think dp.pickup(i) //simply calling monitor proc … // invokes mutual exclusion eat; // automatically ... dp.putdown(i) }

Dining Philosophers void pickup(int i) { state[i] = hungry;
test[i]; // if neighbors are not eating, set own state to eating // signal will go to waste - not blocked if (state[i] != eating) self[i].wait(); // test showed a neighbor was eating - wait } void putdown(int i) { state[i] = thinking; // test left and right neighbors … gives them chance to eat test((i+4) % 5); // signal left neighbor if she is hungry test((i+1) % 5); // signal right neighbor if she is hungry

Dining Philosophers Does more than passively test: changes state and cause a philospher to start eating. void test(int i) { if ( (state[(I + 4) % 5] != eating) && // left neighbor (state[i] == hungry) && // the process being tested (state[(i + 1) % 5] != eating)) { // right neighbor state[i] = eating; self[i].signal(); // i starts eating if blocked } // self[i] is a condition variable } // else do nothing

Monitor Implementation Using Semaphores
Variables semaphore mutex; // (initially = 1) for mutual exclusion on entry semaphore next; // (initially = 0) signaler waits on this int next-count = 0; // # processes (signalers) suspended on next // Entering the monitor: Each external procedure F will be replaced by (ie., put in monitor as): wait(mutex); // lemme into the monitor! … // block if occupied body of F; // same as a critical section // exiting the monitor: if (next-count > 0) // any signalers suspended on next? signal(next) // let suspended signalers in first // same as getting out of line to fill something // out at the DMV (maybe) else signal(mutex); // then let in any new guys // Above is the overall monitor: Mutual exclusion within a monitor is ensured.

Monitor Implementation
For each condition variable x, we have: semaphore x-sem; // (initially = 0) causes a wait on x to block int x-count = 0; // number of processes waiting on CV x The operation x.wait can be implemented as: x-count++; // Time for a nap if (next-count > 0) // any signalers queued up? signal(next); // release a waiting signaler first else signal(mutex); //else let a new guy in wait(x-sem); // unconditionally block on the CV x-count--; // I’m outa here

Monitor Implementation
The operation x.signal can be implemented as: if (x-count > 0) { //do nothing if nobody is waiting on the CV next-count++; // I am (a signaler) gonna be suspended on next signal(x-sem); // release a process suspended on the CV wait(next); // get out of the way of the released process (Hoare) next-count--; // released process done & it signaled me in }

Monitor Implementation - optional
Conditional-wait construct: x.wait(c); c – integer expression evaluated when the wait operation is executed. value of c (a priority number) stored with the name of the process that is suspended. when x.signal is executed, process with smallest associated priority number is resumed next. Check two conditions to establish correctness of system: User processes must always make their calls on the monitor in a correct sequence. Must ensure that an uncooperative process does not ignore the mutual-exclusion gateway provided by the monitor, and try to access the shared resource directly, without using the access protocols.

Solaris 2 Synchronization
Implements a variety of locks to support multitasking, multithreading (including real-time threads), and multiprocessing. Uses adaptive mutexes for efficiency when protecting data from short code segments. Uses condition variables and readers-writers locks when longer sections of code need access to data. Uses turnstiles to order the list of threads waiting to acquire either an adaptive mutex or reader-writer lock.

Windows 2000 Synchronization
Uses interrupt masks to protect access to global resources on uniprocessor systems. Uses spinlocks on multiprocessor systems. Also provides dispatcher objects which may act as wither mutexes and semaphores. Dispatcher objects may also provide events. An event acts much like a condition variable.

Module 8: Deadlocks 10/22/03+ System Model Deadlock Characterization
Methods for Handling Deadlocks Deadlock Prevention Deadlock Avoidance Deadlock Detection Recovery from Deadlock Combined Approach to Deadlock Handling NOTE: Instructor annotations in BLUE Applied Operating System Concepts

The Deadlock Problem P0 P1
A set of blocked processes each holding a resource and waiting to acquire a resource held by another process in the set. Example System has 2 tape drives. P1 and P2 each hold one tape drive and each needs another one. semaphores A and B, initialized to 1 P0 P1 wait (A); wait(B) wait (B); wait(A) Applied Operating System Concepts

Bridge Crossing Example
Traffic only in one direction. Each car on bridge wants space occupied by other to proceed. Each section of a bridge can be viewed as a resource. If a deadlock occurs, it can be resolved if one car backs up (preempt resources and rollback). Several cars may have to be backed upif a deadlock occurs. Starvation is possible. Applied Operating System Concepts

System Model Resource types R1, R2, . . ., Rm
CPU cycles, memory space, I/O devices Each resource type Ri has Wi instances. Each process utilizes a resource as follows: request - if resource not available, processes waits until it is free use - once request granted process free to use it (may still need other process to use it) release - when done using resource return it to system - remember project 2! Applied Operating System Concepts

Deadlock Characterization
Deadlock can arise if four conditions hold simultaneously. Mutual exclusion: only one process at a time can use a resource. Hold and wait: a process holding at least one resource is waiting to acquire additional resources held by other processes in order to proceed. No preemption: a resource can be released only voluntarily by the process holding it, after that process has completed its task. Circular wait: there exists a set {P0, P1, …, P0} of waiting processes such that P0 is waiting for a resource that is held by P1, P1 is waiting for a resource that is held by P2, …, Pn–1 is waiting for a resource that is held by Pn, and P0 is waiting for a resource that is held by P0. These four conditions are not independent. The first three are necessary condition, and the fourth is necessary and sufficient. The fourth condition incorporates the first three. Applied Operating System Concepts

Resource-Allocation Graph
A set of vertices V and a set of edges E. V is partitioned into two types: P = {P1, P2, …, Pn}, the set consisting of all the processes in the system. R = {R1, R2, …, Rm}, the set consisting of all resource types in the system. There are one or more instances of a resource type. A request is made for an instance of a resource type request edge – directed edge P1  Rj assignment edge – directed edge Rj  Pi Applied Operating System Concepts

Resource-Allocation Graph (Cont.)
Process Resource Type with 4 instances Pi requests instance of Rj Pi is holding an instance of Rj Pi Rj Pi Rj Applied Operating System Concepts

Example of a Resource Allocation Graph
P2 holds an instance of R1 and R2, and is waiting for an instance of R3 … P2 will execute when P3 releases R3 Applied Operating System Concepts

Resource Allocation Graph With A Deadlock
Circular wait: all three Processes are mutually waiting for each other to release a resource. Applied Operating System Concepts

Resource Allocation Graph With A Cycle But No Deadlock
When P4 releases its instance of R2, then R2 can be allocated to P3, breaking the cycle. Applied Operating System Concepts

Basic Facts If graph contains no cycles  no deadlock.
If graph contains a cycle  if only one instance per resource type, then deadlock. if several instances per resource type, possibility of deadlock. … a necessary, but not a sufficient condition for deadlock Distinguish between a cycle and a circular wait - the former is a necessary condition, the latter is a necessary and sufficient condition. Applied Operating System Concepts

Methods for Handling Deadlocks
Ensure that the system will never enter a deadlock state - Deadlock prevention or avoidance. Allow the system to enter a deadlock state and then recover - Deadlock detection. Ignore the problem and pretend that deadlocks never occur in the system; used by most operating systems, including UNIX … ignore the problem and maybe it will go away! If it doesn’t - re-boot the machine! Applied Operating System Concepts

Deadlock Prevention Restrain the ways request can be made - disallow a necessary condition. Mutual Exclusion – not required for sharable resources; must hold for nonsharable resources - so if it cannot be applied all of the time, what good is it? - Deadlock not prevented. Hold and Wait – must guarantee that whenever a process requests a resource, it does not hold any other resources. Require process to request and be allocated all its resources before it begins execution, or allow process to request resources only when the process has none. Low resource utilization; starvation possible. Applied Operating System Concepts

Deadlock Prevention (Cont.)
No Preemption – If a process that is holding some resources requests another resource that cannot be immediately allocated to it, then all resources currently being held are released. … how did it get to hold any resources to begin with - with this rule? Preempted resources are added to the list of resources for which the process is waiting. Process will be restarted only when it can regain its old resources, as well as the new ones that it is requesting. Circular Wait – impose a total ordering of all resource types, and require that each process requests resources in an increasing order of enumeration. See “Deadlock principles and Algorithms” notes , on prevention (p. 3.) Applied Operating System Concepts

Deadlock Avoidance Requires that the system has some additional a priori information available. Simplest and most useful model requires that each process declare the maximum number of resources of each type that it may need. The deadlock-avoidance algorithm dynamically examines the resource-allocation state to ensure that there can never be a circular-wait condition. Resource-allocation state is defined by the number of available and allocated resources, and the maximum demands of the processes. See “Deadlock principles and Algorithms” notes , on Resource-allocation State (p. 4.) Applied Operating System Concepts

Safe State When a process requests an available resource, system must decide if immediate allocation leaves the system in a safe state. System is in safe state if there exists a safe sequence (sequence of execution) of all processes. Sequence <P1, P2, …, Pn> is safe if for each Pi, the resources that Pi can still request can be satisfied by currently available resources + resources held by all the PJ, with j < i. If Pi resource needs are not immediately available, then Pi can wait until all previous PJ have finished. When all PJ have finished, Pi can obtain needed resources, execute, return allocated resources, and terminate. When Pi terminates, Pi+1 can obtain its needed resources, and so on. BASIC CONCEPT SUMMARY: As each Pi terminates, the total amount of the resource in question gets a net increase by the amount of what Pi was holding (“hoarding”) before it was granted the extra amount it needed - the loaning and returning of this “extra amount” has no net effect on the total resource - only the held amount contributes. Applied Operating System Concepts

Basic Facts If a system is in safe state  no deadlocks.
If a system is in unsafe state  possibility of deadlock. Avoidance  ensure that a system will never enter an unsafe state. Applied Operating System Concepts

Safe, unsafe , deadlock state spaces

Resource-Allocation Graph Algorithm
Graphical approach: To be used ONLY if there is only one instance of each resource. Claim edge Pi  Rj indicated that process Pj may request resource Rj; represented by a dashed line. Claim edge converts to request edge when a process requests a resource. When a resource is released by a process, assignment edge reconverts to a claim edge. Resources must be claimed a priori in the system. Applied Operating System Concepts

Resource-Allocation Graph Illustrating the Deadlock Avoidance Approach
2. Sequence: claim edge -> request edge -> assignment edge -> claim edge. 1. Previously we had “request” and “Assignment” edges, now we have a “Claim edge”: Pi --->RJ indicates that Pi may request RJ in the future. Similar to request edge, but dotted. Claim edge represents the Maximum the process would potentially request of a resource 4. Must run P1 before P2 (allocate R2 to P1) in order to be in safe state. If R2 is allocated to P2 first, we have a cycle and the potential of deadlock if P1 eventually requests R2 ie., unsafe state (Note: one instance per resource) 3. Resource Allocation State: P1 holds R1 and “needs” a max of an R2 which is available Applied Operating System Concepts

Unsafe State In A Resource-Allocation Graph
We have a cycle and the potential of deadlock if P1 eventually requests R2 this is an unsafe state (Note: one instance per resource) Applied Operating System Concepts

Banker’s Algorithm Multiple instances - distinguish from previous example of single instance of each resource type. The the concepts and terms introduced in the previous graphical example (one resource unit per resource type) will be generalized here for the multiple instances case. Most general version of Avoidance Algorithm Each process must a priori claim maximum use. When a process requests a resource it may have to wait. When a process gets all its resources it must return them in a finite amount of time. See “Deadlock principles and Algorithms” notes , on Avoidance (pp. 5-9.) - also review Resource-Allocation State section, p. 4. ==> ... Applied Operating System Concepts

Data Structures for the Banker’s Algorithm
See “Deadlock Notes” p. 4 by Iinstructor. Let n = number of processes, and m = number of resources types. Available: Vector of length m. If available [j] = k, there are k instances of resource type Rj available (currently). Max: n x m matrix. If Max [i,j] = k, then process Pi may request at most k instances of resource type Rj. Note: Some books call this the “Claim Matrix”. Allocation: n x m matrix. If Allocation[i,j] = k then Pi is currently allocated k instances of Rj. Need: n x m matrix. If Need[i,j] = k, then Pi may need k more instances of Rj to complete its task. Need [i,j] = Max[i,j] – Allocation [i,j]. A derived entity – may not be part of what is given. Applied Operating System Concepts

Safety Algorithm This is more properly a lemma for the Banker’s Algorithm. It is a test to see If any given allocation state is safe or not. Goal is to find a “safe sequence” of process execution or show that it does not exist. 1. Let Work and Finish be vectors of length m and n, respectively. Initialize: Work := Available … a vector Finish [i] = false for i - 1,3, …, n. 2. Find and i such that both (find i by any means you want): (a) Finish [i] = false (b) Needi  Work … Needi is ith row of need – a vector. If no such i exists, go to step 4. 3. Work := Work + Allocationi Finish[i] := true go to step 2. 4. If Finish [i] = true for all i, then the system is in a safe state. Applied Operating System Concepts

Resource-Request Algorithm for Process Pi or The Banker’s Algorithm
Requesti = request vector for process Pi. If Requesti [j] = k then process Pi wants k instances of resource type Rj. 1. If Requesti  Needi go to step 2. Otherwise, raise error condition, since process has exceeded its maximum claim. 2. If Requesti  Available, go to step 3. Otherwise Pi must wait, since resources are not available. 3. Pretend to allocate requested resources to Pi by modifying the state as follows (new state): Available := Available = Requesti; Allocationi := Allocationi + Requesti; Needi := Needi – Requesti;; Now apply the Safety Algorithm: If safe  the resources are allocated to Pi. If unsafe  Pi must wait, and the old resource-allocation state is restored Note that steps 1 and 2 are preliminary “sanity” or consistency checks, and not part of the main “loop” Applied Operating System Concepts

Example of Banker’s Algorithm
==> See Deadlock Notes by Guydosh on website (p. 8) for details 5 processes P0 through P4; 3 resource types A (10 instances), B (5instances, and C (7 instances). Snapshot at time T0: Allocation Max Available A B C A B C A B C P P P P P Applied Operating System Concepts

Example (Cont.) The content of the matrix. Need is defined to be Max – Allocation. Need A B C P P P P P The system is in a safe state since the sequence < P1, P3, P4, P2, P0> satisfies safety criteria. Applied Operating System Concepts

Example (Cont.): P1 request (1,0,2)
Check that Request  Available (that is, (1,0,2)  (3,3,2)  true. Allocation Need Available A B C A B C A B C P P P P P Executing safety algorithm shows that sequence <P1, P3, P4, P0, P2> satisfies safety requirement. Can request for (3,3,0) by P4 be granted? Can request for (0,2,0) by P0 be granted? Applied Operating System Concepts

Deadlock Detection Allow system to enter deadlock state
Detection algorithm Recovery scheme Applied Operating System Concepts

Single Instance of Each Resource Type
When we have a single instance of each resourece type, a cycle is necessary and sufficient for deadlock. Maintain wait-for graph Nodes are processes. Pi  Pj if Pi is waiting for Pj. Periodically invoke an algorithm that searches for a cycle in the graph. An algorithm to detect a cycle in a graph requires an order of n2 operations, where n is the number of vertices in the graph. … assumes the “cycle detector” would not be involved in any deadlock Applied Operating System Concepts

Resource-Allocation Graph And Wait-for Graph
Corresponding wait-for graph No big deal - topologically it is essentially the same as resource allocation graph. Applied Operating System Concepts

Several Instances of a Resource Type
See See “Deadlock principles and Algorithms” notes , on Deadlock Detection (pp ) Available: A vector of length m indicates the number of available resources of each type. Allocation: An n x m matrix defines the number of resources of each type currently allocated to each process. Request: An n x m matrix indicates the current request of each process. If Request [ij] = k, then process Pi is requesting k more instances of resource type. Rj. … Note that the Request matrix in this detection algorithm replaces the MAX matrix in the avoicance algorithm - the request matrix is a less severe demand on the system. The request matrix is the “current” demand, rather than the maximum demand as in the avoicance algorithm. Applied Operating System Concepts

Detection Algorithm See “Deadlock principles and Algorithms” notes , on Detection (pp ) 1. Let Work and Finish be vectors of length m and n, respectively Initialize: (a) Work := Available (b) For i = 1,2, …, n, if Allocationi  0, then Finish[i] := false;otherwise, Finish[i] := true. 2. Find an index i such that both: (a) Finish[i] = false (b) Request i  Work If no such i exists, go to step 4. Applied Operating System Concepts

Detection Algorithm (Cont.)
3. Work := Work + Allocationi Finish[i] := true go to step 2. 4. If Finish[i] = false, for some i, 1  i  n, then the system is in deadlock state. Moreover, if Finish[i] = false, then Pi is deadlocked. Algorithm requires an order of m x n2 operations to detect whether the system is in deadlocked state, where m = number of resource types, and n = number of processes. Applied Operating System Concepts

Example of Detection Algorithm
See Deadlock Notes by Guydosh on website (p. 12) for details Five processes P0 through P4; three resource types A (7 instances), B (2 instances), and C (6 instances). Snapshot at time T0: Allocation Request Available A B C A B C A B C P P P P P Sequence <P0, P2, P3, P1, P4> will result in Finish[i] = true for all i. Applied Operating System Concepts

Example (Cont.) P2 requests an additional instance of type C. Request
A B C P P P P P State of system? Can reclaim resources held by process P0, but insufficient resources to fulfill other processes; requests. Deadlock exists, consisting of processes P1, P2, P3, and P4. Applied Operating System Concepts

Detection-Algorithm Usage
When, and how often, to invoke depends on: How often a deadlock is likely to occur? How many processes will need to be rolled back? one for each disjoint cycle If detection algorithm is invoked arbitrarily, there may be many cycles in the resource graph and so we would not be able to tell which of the many deadlocked processes “caused” the deadlock. Applied Operating System Concepts

Recovery from Deadlock: Process Termination
See Deadlock Notes by Guydosh for details Abort all deadlocked processes. Abort one process at a time until the deadlock cycle is eliminated. In which order should we choose to abort? Priority of the process. How long process has computed, and how much longer to completion. Resources the process has used. Resources process needs to complete. How many processes will need to be terminated. Is process interactive or batch? Applied Operating System Concepts

Recovery from Deadlock: Resource Preemption
Selecting a victim – minimize cost. Rollback – return to some safe state, restart process fro that state. Starvation – same process may always be picked as victim, include number of rollback in cost factor. Applied Operating System Concepts

Combined Approach to Deadlock Handling
Combine the three basic approaches prevention avoidance detection allowing the use of the optimal approach for each of resources in the system. Partition resources into hierarchically ordered classes. Use most appropriate technique for handling deadlocks within each class. Applied Operating System Concepts

Module 9: Memory Management
Updated 11/7/03 Distinct from Virtual memory: Chapter 10 Background Logical versus Physical Address Space Swapping Contiguous Allocation Paging Segmentation Segmentation with Paging Annotations by instructor are in blue Applied Operating System Concepts

Background Program must be brought into memory and loaded as a process for it to be executed. Input queue – collection of processes on the disk that are waiting to be brought into memory for execution. User programs go through several steps before being executed. Applied Operating System Concepts

Binding of Instructions and Data to Memory
See notes by Guydosh: Address “Binding, Loading and Linking” on this topic - High level summary below: Address binding of instructions and data to memory addresses can happen at three different stages. Compile time: If memory location known a priori, absolute code can be generated; must recompile code if starting location changes. Load time: Must generate relocatable code if memory location is not known at compile time. – if swapped out must be re-loaded to same location from original location Execution time: Binding delayed until run time if the process can be moved during its execution from one memory segment to another. Need hardware support for address maps (e.g., base and limit registers). - dynamic address translation Applied Operating System Concepts

Multistep Processing of a User Program

Dynamic Loading and Linking
“DLL’s” - See notes by Guydosh: Address “Binding, Loading and Linking” Routine is not loaded until it is called at runtime “application” module is built up at run time. Obviously the newly loaded module must be also linked in real time to resolve addresses relative to the original module referencing it. Better memory-space utilization; unused routine is never loaded. Useful when large amounts of code are needed to handle infrequently occurring cases. Logical to real address binding is at runtime. No special support from the operating system is required implemented through program design - loaded modules are in re-locatable format? Applied Operating System Concepts

Dynamic Linking to common shared libraries
“DLL’s” - See notes by Guydosh: Address “Binding, Loading and Linking” Rather than building a “private module” by loading at runtime, we link to a common “library” of common functions. The library may or may not be in memory, if it is not at the time called, then load it. In all cases dynamically link (at run time) to it via “stubs”. Linking postponed until execution time. Small piece of code, stub, used to locate the appropriate memory-resident library routine. Stub replaces itself with the address of the routine, and executes the routine. Operating system needed to check if routine is in processes’ memory address. Applied Operating System Concepts

Overlays Keep in memory only those instructions and data that are needed at any given time. Needed when process is larger than amount of memory allocated to it. Implemented by user, no special support needed from operating system, programming design of overlay structure is complex Applied Operating System Concepts

Logical vs. Physical Address Space
Will assume run-time binding of logical to physical The concept of a logical address space that is bound to a separate physical address space is central to proper memory management. Logical address – generated by the CPU; also referred to as virtual address. Physical address – address seen by the memory unit (not MMU). Logical and physical addresses are the same in compile-time and load-time address-binding schemes; logical (virtual) and physical addresses differ in execution-time address-binding scheme. MMU sees logical addresses: hardware assist in converting LA to PA. Applied Operating System Concepts

Memory-Management Unit (MMU)
Hardware device that maps virtual to physical address. In MMU scheme, the value in the relocation register is added to every address generated by a user process at the time it is sent to memory. The user program deals with logical addresses; it never sees the real physical addresses. See Figure 9.5 Applied Operating System Concepts

Dynamic relocation using a relocation register
Figure a function of the MMU (should include error/range checking also. Applied Operating System Concepts

Swapping A process can be swapped temporarily out of memory to a backing store, and then brought back into memory for continued execution –remember, suspend state frees up memory when all processes block for I/O – maximizes degree of multiprogramming. Backing store – fast disk large enough to accommodate copies of all memory images for all users; must provide direct access to these memory images. Roll out, roll in – swapping variant used for priority-based scheduling algorithms; lower-priority process is swapped out so higher-priority process can be loaded and executed. Major part of swap time is transfer time; total transfer time is directly proportional to the amount of memory swapped. Modified versions of swapping are found on many systems, i.e., UNIX and Microsoft Windows. Need run-time address binding to allow re-locatability of swapped processes. Applied Operating System Concepts

Schematic View of Swapping

Contiguous Allocation
Contiguous means non-paged – keep process together- each process is kept in a single contiguous section of memory. Main memory usually into two partitions: Resident operating system, usually held in low memory with interrupt vector. User processes then held in high memory. Single-partition allocation – see later slide Multiple processes in a single partition in user space (see dynamic contiguous allocation below) Relocation-register scheme used to protect user processes from each other, and from changing operating-system code and data. Relocation register contains value of smallest physical address; limit register contains range of logical addresses – each logical address must be less than the limit register. Applied Operating System Concepts

Contiguous Allocation (cont.)
Allocation could be within a series of fixed size partitions (below), or “dynamically” within a single user partition (as above) Multi- partition allocation (fixed or variable size partitions) - “over-allocation”: Fixed size Partitions: User space is partitioned into a series of fixed partitions: Variable size Partitions: User space is partitioned into a series of partitions of varying sizes: equal vs. variable size blocks tradeoffs Internal fragmentation – within a partition owned by a process - due to “over-allocation” Allocation unit results in over-allocation if process too large, use overlays Early example: IBM’s MFT operating system Ref: See Stallings fig 7.2 and 7.3 Applied Operating System Concepts

Stallings, fig. 7.2 Examples of partitioning Of a 64 MB memory using constant sized and variable sized partitions Applied Operating System Concepts

Stallings, fig. 7.3 Examples of memory assignment for multi partitioning with variable or constant partitions Applied Operating System Concepts

Single user partition using “dynamic” (term from book) contiguous allocation Memory is not “carved” up into a series of partitions rather we have a single partition where processes of any size may be allocated. Could start anywhere. Problem is external fragmentation Early example: IBM’s MVT operating system Need Relocation hardware scheme in any Contiguous scheme: multiple or single to convert logical to physical addresses at run time, since we do not know where the process will be swapped to Applied Operating System Concepts

Contiguous Allocation (Cont.)
Contiguous Allocation - within a single user partition Hole – block of available memory; holes of various size are scattered throughout memory. When a process arrives, it is allocated memory from a hole large enough to accommodate it. Operating system maintains information about: a) allocated partitions b) free partitions (hole) Examples of alloclated and un-allocated (holes) memory in single partition: OS OS OS OS process 5 process 5 process 5 process 5 process 9 process 9 process 8 process 10 process 2 process 2 process 2 process 2 Applied Operating System Concepts

Fragmentation Stallings, fig. 7.4 Examples of external
fragmentation due to “dynamic” allocation in a single user partition Applied Operating System Concepts

Another Example of External Fragmentation
From an “alternate” Silberschatz book Applied Operating System Concepts

Fragmentation External fragmentation – total memory space exists to satisfy a request, but it is not contiguous. Internal fragmentation – allocated memory may be slightly larger than requested memory; this size difference is memory internal to a partition, but not being used. Reduce external fragmentation by compaction Shuffle memory contents to place all free memory together in one large block. Compaction is possible only if relocation is dynamic, and is done at execution time. I/O problem Latch job in memory while it is involved in I/O. Do I/O only into OS buffers. Reference: See Stallings: figs 7.4, 7.5 Applied Operating System Concepts

Dynamic Storage-Allocation Problem
How to satisfy a request of size n from a list of free holes. First-fit: Allocate the first hole that is big enough. Best-fit: Allocate the smallest hole that is big enough; must search entire list, unless ordered by size. Produces the smallest leftover hole. Worst-fit: Allocate the largest hole; must also search entier list. Produces the largest leftover hole. Next fit (see Stallings) See Stallings fig 7.5 First-fit and best-fit better than worst-fit in terms of speed and storage utilization. Applied Operating System Concepts

Fragmentation Stallings, fig. 7.5 Examples of dynamic
allocation algorithms Allocation of 16 mb block Applied Operating System Concepts

Paging – Non-contiguous Schemes
Logical address space of a process can be contiguous; but physical addresses are noncontiguous; Process is allocated physical memory whenever the latter is available. Requires Run-time address binding Divide physical memory into fixed-sized blocks called frames (of relatively small size - size is power of 2 - WHY? , typically fbetween 512 bytes and 8192 bytes). Divide logical memory into blocks of same size called pages. Keep track of all free frames. To run a program of size n pages, need to find n free frames and load program. Set up a page table to translate logical to physical addresses. Internal fragmentation –a but this is minor compared to external Fragmentation. Reminder: just because paging used, does not necessarily mean virtual memory Applied Operating System Concepts

Address Translation Scheme
Address generated by CPU is divided into: Page number (p) – used as an index into a page table which contains base address of each page in physical memory. Page offset (d) – combined with base address to define the physical memory address that is sent to the memory unit. See Stallings fig. 7.9, page 318 Applied Operating System Concepts

Page Assignment Assignment of process Pages to free frames
Stallings, fig 7.9  Page/frame tables Stallings, fig 7.9 below  Key idea: each process has its own page table (f) Applied Operating System Concepts

Address Translation Architecture
Page table is directly indexed - not searched Applied Operating System Concepts

Logical addresses Logical addressing with Partitioning and paging
Same logical addr as in partitioning case = 0x1DE (10 bits) 1K page ==> low 10 bits are the offset Logical addressing with Partitioning and paging Stallings fig 7.11 For Partitioning: Logical address is an offset into the process For paging: LA specifies a page in the process and offset into this page Assume the Logical Address 0x05DE points to a byte boundry in a memory Applied Operating System Concepts

Logical to physical address translation
A “page” in physical memory is called a “frame” = 0x05DE = (0x1, 0x1DE) LA ordered pair Logical to physical address translation Stallings, fig. 7.12 Page number = 0x01 Frame number = 0x06 Frame offset = 0x1DE Physical page “frames” could be “scattered” anywhere in memory (on page boundries) via address translation. Frame numbers are determided by memory allocation algorthms … use of free list etc. Applied Operating System Concepts

Paging Example See Stallings , fig. 7.10, pg. 319
Stored in memory See Stallings , fig. 7.10, pg. 319 Also examples in fig – 7.12 See also fig 9.6., p. 288 (silberschatz) for example on address translation Applied Operating System Concepts

Implementation of Page Table
Page table is kept in main memory. Page-table base register (PTBR) points to the page table. Page-table length register (PRLR) indicates size of the page table. In this scheme every data/instruction access requires two memory accesses. One for the page table and one for the data/instruction. Doubles memory access time! The two memory access problem can be solved by the use of a special fast-lookup hardware cache called associative registers or translation look-aside buffers (TLBs) See fig 9.10, page 294 ==> Applied Operating System Concepts

TLB: A Cache for the Page Table
“paging” the page table Silber. Fig. 9.10 TLB is logical subset of PT on TLB miss load TLB from PT, while immediately translating to real. Applied Operating System Concepts

Associative Register (TLB)
Associative registers – parallel search “Broadside parallel hardware search for Page number - use multiplexers and compare circuits. Address translation (A´, A´´) If A´ is in associative register, get frame # out. Otherwise get frame # from page table in memory Page # Frame # Applied Operating System Concepts

Effective Access Time Associative Lookup =  time unit
Assume memory cycle time is 1 microsecond Hit ration – percentage of times that a page number is found in the associative registers; ration related to number of associative registers. Hit ratio =  Effective Access Time (EAT) = (tTLB + tMEM_DATA) + (1- )(tTLB + tMEM_PT + tMEM_DATA) EAT = (1 + )  + (2 + )(1 – ) = 2 +  –  Applied Operating System Concepts

Memory Protection Memory protection implemented by associating protection bits for each frame – normally kept in the page table. Ex: “RW”, “RO”, or execute only bits … as the permission bits in UNIX Valid-invalid bit attached to each entry in the page table: “valid” indicates that the associated page is in the process’ logical address space, and is thus a legal page. “invalid” indicates that the page is not in the process’ logical address space. Retaining invalid page entries is wasteful ==> can use HW assist : Keep only valid entries - Page table length register (PTLR) to indicate size of PT. Every logical address must be < PTLR to be valid. See fig. 9.11, page 296 Applied Operating System Concepts

Memory Protection 14 bit Logical address, page size is 2K , 11 bits for page offset, 3 bits for page number, Logical address space is 214 = bytes = 23 = 8 pages Note: although page 5 is marked valid, addresses beyond 10468 (last byte in process) are illegal – need range checking as in fig 9.5 Note: “pages” 6 & 7 not resident in memory, because they are invalid Has page entry for all pages in logical memory Applied Operating System Concepts

Two-Level Page-Table Scheme
One solution to large PT problem For a 32 bit address, and a 4k page, the non-paged PT would have 1 meg entries - at 4 bytes /entry, the PT would take up 4MB! Solution: “demand” page in only those blocks of thepage table that are needed. “Page the page table” Outer “pinned” in memory ==> See also 3 level PT, p. 299 Applied Operating System Concepts

Two-Level Paging Example
A logical address (on 32-bit machine with 4K page size) is divided into: a page number consisting of 20 bits. a page offset consisting of 12 bits. Since the page table is paged, the page number is further divided into: a 10-bit page number. a 10-bit page offset. Thus, a logical address is as follows: where pi is an index into the outer page table, and p2 is the displacement within the page of the outer page table. page number page offset pi p2 d 10 10 12 Applied Operating System Concepts

Address-Translation Scheme
Address-translation scheme for a two-level 32-bit paging architecture Applied Operating System Concepts

Multilevel Paging and Performance
Four level PT (p. 280 bottom), Motorola 68000, p. 281: Since each level is stored as a separate table in memory, covering a logical address to a physical one may take four memory accesses. Even though time needed for one memory access is quintupled, caching permits performance to remain reasonable. Cache (TLB) hit rate of 98 percent yields: effective access time = 0.98 x x 520 = 128 nanoseconds. Where TLB access is 20 s, memory access is 100 s 4 accesses to get physical address plus 1 to get data. Thus 520 = x which is only a 28 percent slowdown in memory access time. Applied Operating System Concepts

Inverted Page Table Another solution to large PT problem
One entry for each real page of memory. … one inverted PT for all processes Entry consists of the virtual address of the page stored in that real memory location, with information about the process that owns that page, ie., PID, Frame Number Frame # is index into table of desired entry Decreases memory needed to store each page table, but increases time needed to search the table when a page reference occurs. Use hash table to limit the search to one — or at most a few — page-table entries. See also hashed page tables, section … improves performance - hash table has entry only for “used” page numbers, not all. Applied Operating System Concepts

Inverted Page Table Architecture
PT is now searched, and the index of the “hit” is the frame adress. Search inverted page table using a hash search … has the flavor of associate memory - but hash used instead of hardware TLB Applied Operating System Concepts

Shared Pages Remember: each process has its own page table
Can implement shared “RO” data using shared pages – saves space. Shared code One copy of read-only (reentrant) code shared among processes (i.e., text editors, compilers, window systems). Shared code must appear in same location in the logical address space of all processes. Private code and data Each process keeps a separate copy of the code and data. The pages for the private code and data can appear anywhere in the logical address space. Applied Operating System Concepts

Shared Pages Example Applied Operating System Concepts

Segmentation Memory-management scheme that supports user view of memory. Similar to a paing scheme in which aht page size is variable - segments “build” around functional units. A program is a collection of segments. A segment is a logical unit such as: main program, procedure, function, local variables, global variables, common block, stack, symbol table, arrays Applied Operating System Concepts

Logical View of Segmentation
1 4 2 3 1 2 3 4 user space physical memory space Applied Operating System Concepts

Segmentation Architecture
Logical address consists of a two tuple: <segment-number, offset>, Segment table – (an array of base and limit registers) maps two-dimensional physical addresses; each table entry has: base – contains the starting physical address where the segments reside in memory. limit – specifies the length of the segment. Segment-table base register (STBR) points to the segment table’s location in memory. Segment-table length register (STLR) indicates number of segments used by a program (number of entries in table); segment number s is legal if s < STLR. Applied Operating System Concepts

Segmentation Architecture (Cont.)
Relocation. Dynamic (at run time) by segment table Sharing. shared segments same segment number Allocation. suffers from external fragmentation - as did contifuous allocation in a single partition. Use various algorithms to deal with external fragmenttion such as: first fit/best fit Applied Operating System Concepts

Segmentation Hardware (fig. 9.18)
Where segment starts d  limit PA = base+d d > limit:: error Applied Operating System Concepts

Segmentation Architecture (Cont.)
Protection. With each entry in segment table associate: validation bit = 0  illegal segment read/write/execute privileges Protection bits associated with segments; code sharing occurs at segment level. Since segments vary in length, memory allocation is a “dynamic” storage-allocation problem - contiguous in a single partition. A segmentation example is shown in the following diagram Applied Operating System Concepts

Applied Operating System Concepts

Segmentation with Paging – Ex: MULTICS
In a combined paging/segmentation scheme, user space broken up into segments (at discretion of programmer). Each segment, in turn, is broken up into number of fixed size pages = size of main memory frame Associated with each process is a segment table & a number of page tables: one page table per segment. Presented with a logical address, the processor uses the seg number to index into the process seg table to find the address of the page table for that segment. <======== key! Then the page number portion of the LA is used to index the page table & look up the corresponding frame number Example: The MULTICS system solves problems of external fragmentation and lengthy search times by paging the segments. Solution differs from pure segmentation in that the segment-table entry contains not the base address of the segment, but rather the base address of a page table for this segment: see next ==> Applied Operating System Concepts

MULTICS Address Translation Scheme
S = seg#, d = (page#, offset) = (p,d’) d = offset in segment (d <= seg length: legal) d’ = offset in page Points to beginning of page table ==> P=Offset in PT Similar to fig 8.13, Stallings Applied Operating System Concepts

Segmentation with Paging – Intel 386
As shown in the following diagram, the Intel 386 uses segmentation with paging for memory management with a two-level paging scheme. Applied Operating System Concepts

Applied Operating System Concepts

Comparing Memory-Management Strategies
Summary: read section 9.7 Hardware support – base & limit registers, mapping tables, TLB Performance – overhead in address mapping – TLB etc. Fragmentation - a problem with contiguous allocation Relocation – compaction and paging requires relocation Swapping – allows more processes than can fit into memory to multi-programmed and increase CPU utilization Sharing – more efficient use of memory, allows multiprogramming more processes, needs paging or segmentation Protection – implemented in the paging/segmentation design Applied Operating System Concepts

Module 10: Virtual Memory
Background Demand Paging Performance of Demand Paging Page Replacement Page-Replacement Algorithms Allocation of Frames Thrashing Other Considerations Demand Segmentation Annotations by instructor are in blue Adapted for 6th ed, Last Updated 11/12/03++ Applied Operating System Concepts

Background Virtual memory – separation of user logical memory from physical memory. Only part of the program needs to be in memory for execution. Logical address space can therefore be much larger than physical address space. Paging always used: Not all pages need be in memory, pages to be “swapped” in and out on demand . Virtual memory can be implemented via: Demand paging Demand segmentation Applied Operating System Concepts

Virtual Memory That is Larger Than Physical Memory
Fig. 10.1 Applied Operating System Concepts

Demand Paging - p No longer need to entire process in memory – only that portion which is needed as opposed to swapping in chapter 9 – in which all pages of a process were required to be resident in memory when a process was loaded in memory. Now use a “lazy swapper”: only swaps a page into memory if it is needed: demand paging: Works because of the locality of reference - a fundamental principle to be described later. Bring a page into memory only when it is needed. Less I/O needed Less memory needed Faster response More users Page is needed  reference to it invalid reference  abort not-in-memory  bring to memory ==>page fault See fig 10.1, 10.2, 10.3, 10.4 (VM Scheme) Ideal model is to “pause” instruction on Page Fault. Reality is that instruction is re-started - problematic for overlapped source- destination - see page 324 Applied Operating System Concepts

Transfer of a Paged Memory to Contiguous Disk Space
Paged swapping scheme as in chapter 9 - entire process in memory suppose now we do not require all pages to be “resident” . Fig. 10,2 Applied Operating System Concepts

Valid-Invalid Bit With each page table entry a valid–invalid bit is associated (1  in-memory (and a “legal” page), 0  not-in-memory - legal but on disk or illegal) … distinguish from invalid (illegal) reference - see fig. 10.3, 10.4. Initially valid–invalid but is set to 0 on all entries. Example of a page table snapshot. During address translation, if valid–invalid bit in page table entry is 0  page fault (or possibly illegal regerence). Frame # valid-invalid bit 1 1 1 1 ... page table Applied Operating System Concepts

Page Table When Some Pages Are Not in Main Memory and some ouside process space
Meaning of valid bit: valid means page is both in memory and legal (in address space of process) invalid means that page is either outside of the address space of the process (illegal), OR a legal address but not currently resident in memory (a page fault - most common) NOTE: ref to pages 6 & 7 are illegal. refs to pages 3 & 4 are page faults both cases marked invalid. Fig. 10.3 Applied Operating System Concepts

Page Fault If there is ever a reference to a page, first reference will trap to OS  page fault OS looks at another table to decide: illegal reference  abort. or Just not in memory ==> a page fault. Get empty frame. For page fault: “Swap” page into frame. … “swap” is bad terminology in this context - use “page” as a verb. Reset tables, validation bit = 1. Problem: Restart instruction: Overlapped block move - a problem (see p. 324) IBM Sys/360/370 MVC: restore memory before trap or make sure all pages are in memory before starting Applied Operating System Concepts

Steps in Handling a Page Fault
No page replacement needed in this case - a free frame was found. Fig. 10.4 Applied Operating System Concepts

But what happens if there are no free frames?
Page replacement – find some page in memory, but not really in use, “swap” it out if modified, or overlay it if not modified. Get rid of “dead wood” come up with an replacement algorithm Algorithm should maximize performance: want an algorithm which will result in minimum number of page faults, and does not add excessive overhead. Two killers: Same page may be brought into memory several times ... if it happens too frequently, then “thrashing” - see later. Too much overhead. Applied Operating System Concepts

Performance of Demand Paging- p. 305
See sect , page 325 on Page Fault scenario- handled similar to I/O request Page Fault Rate 0  p  1.0 … 1-p is the “hit ratio” if p = 0 no page faults if p = 1, every reference is a fault In practice p should be very small: p < .001 1-p is called the hit ratio Effective Access Time (EAT) EAT = (1 – p) x memory access + p {page fault overhead + [swap page out ] + swap page in + restart overhead} Applied Operating System Concepts

Demand Paging Example from p. 327
Memory access time = 100 nanoseconds Page fault service time = 25milliseconds Let the fault rate be p Effective access time: EAT = (1-p)(100ns) +p(25ms) a 10% degradation would be: EAT = 110 > 100(1-p) + 25,000,000p nanoseconds Solving for p gives: p < 4x or page faults must occur no more than 4 per 10 million references to achieve a degradation of no more than 10%!  Fault rate must be kept extremely low … if you know nothing else, would you say it is possible for real programs? Applied Operating System Concepts

Process Creation Virtual memory allows other benefits during process creation: - Copy-on-Write - Memory-Mapped Files Applied Operating System Concepts

Copy-on-Write If forked child process immediately calls an “execp()” to overlay itself with another process, then physically “cloning” (copying) the parent is unnecessary. If the child or parent never modifies itself, then no need to keep separate copies of process data for for both parent and child – share a single copy page sharing in virtual memory allows this: Copy-on-Write (COW) allows both parent and child processes to initially share the same pages in memory. If either process modifies a shared page, only then is the page copied. COW allows more efficient process creation as only modified pages are copied. Free pages are allocated from a pool of zeroed-out pages. Applied Operating System Concepts

Memory-Mapped Files Memory-mapped file I/O allows file I/O to be treated as routine memory access by mapping a disk block to a page in memory. Part of the virtual address space is logically associated with a file and can be demand paged into memory an extension of the idea of paging in a portion of a process - now we apply it to files. A file is initially read using demand paging. A page-sized portion of the file is read from the file system into a physical page. Subsequent reads/writes to/from the file are treated as ordinary memory accesses. Simplifies file access by treating file I/O through memory rather than read() write() system calls - useful for device drivers. Also allows several processes to map the same file allowing the pages in memory to be shared. Applied Operating System Concepts

Memory Mapped Files Applied Operating System Concepts

Page Replacement - sect 10.3, p. 308
Prevent over-allocation of memory (no free frames) by modifying page-fault service routine to include page replacement. You don’t prevent overallocation - this is the name of the game - you just learn to live with it! ==> use page replacement. Use modify (dirty) bit to reduce overhead of page transfers – only modified pages are written to disk. Page replacement completes separation between logical memory and physical memory – large virtual memory can be provided on a smaller physical memory. “Ideally”, the user is unaware of paging: VM paging is transparent to the user … at most a small performance hit. Applied Operating System Concepts

Need For Page Replacement
Options when memory full: kill the process - not good swap the process out - maybe do page replacement - OK page hit ==> Memory full – Free frame List is empty … must free up Space: page replacement page fault ==> B needed, but not in memory Fig 10.6 Applied Operating System Concepts

Basic Page Replacement
Find the location of the desired page on disk. Find a free frame: - If there is a free frame, use it. - If there is no free frame, use a page replacement algorithm to select a victim frame … this is the likelihood. Read the desired page into the (newly) free frame. Update the page and frame tables. If the “victim” page frame is modified it will have to paged out to the disk. Restart the process (process was blocked during page fault processing). Applied Operating System Concepts

Page Replacement “Victim” page invalidated frame # is now garbage (0)
Originally was in frame f ==> New page brought in ==> put in frame f Applied Operating System Concepts

Page-Replacement Algorithms
Want lowest page-fault rate. Evaluate algorithm by running it on a particular string of memory references (reference string) and computing the number of page faults on that string. In all our examples, the reference string is 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5. These are “crappy” examples SEE better examples in the text section 10.4 of Silberschatz 6th ed. These have been added to these set of slides (marked Silb 6th ed) Applied Operating System Concepts

Graph of Page Faults Versus The Number of Frames (ideally- see “non-ideal” case for FIFO later)

First-In-First-Out (FIFO) Algorithm
Diagram below not clear – see slide for more clear example Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 3 frames (3 pages can be in memory at a time per process) 4 frames FIFO Replacement – Belady’s Anomaly more frames  less page faults - only due to initial loading - poor example 1 Student exercise: Use the approach in slide 10.24 To determine the 9 & 10faults In these two examples 1 4 5 2 2 1 3 9 page faults 3 3 2 4 1 1 5 4 2 2 1 5 10 page faults 3 3 2 4 4 3 Applied Operating System Concepts

FIFO Page Replacement From Silb 6th ed - fig 10.9, p. 336

FIFO Illustrating Belady’s Anamoly

Optimal Algorithm (OPT)
Diagram below not clear – see slide for more clear example Replace page that will not be used for longest period of time. 4 frames example 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 Not implementable because it requires knowledge of the future Used for measuring how well your algorithm performs. 1 4 Student exercise: Use the approach in slide 10.27 To determine the 6 faults In this example 2 6 page faults 3 4 5 Applied Operating System Concepts

Optimal Page Replacement
From Silb 6th ed - fig 10.11, p. 338 Applied Operating System Concepts

Least Recently Used (LRU) Algorithm
Diagram below not clear – see slide for more clear example Reference string: 1, 2, 3, 4, 1, 2, 5, 1, 2, 3, 4, 5 Student exercise: Use the approach in slide 10.29 To determine the 8 faults 1 5 2 3 5 4 4 3 If we use recent past as an approximation of near future, then we replace the page that has not been used for the longest period of time. ==> the LRU algorithm … Same as the OPT algorithm looking back in time instead of the future. A good approximation of OPT – maybe the best outside of OPT Applied Operating System Concepts

LRU Page Replacement From Silb 6th ed - fig 10.12, p. 339

LRU Algorithm (Cont.) Counter implementation - time stamp in the page table entry There is a hardware clock incremented by the CPU each time a memory reference is made (count of total number references) Every page table entry has a counter; every time page is referenced through this entry, copy the clock into the counter. (periodic snapshot of clock). Updating counter has some overhead, but is rolled into accessing the PT for each reference. Replace the page with the smallest (oldest) time value. Logically requires a search to find LRU page - but only done during page fault time - may be small compared to disk access A write to memory (PT) is required each time a page represented in the page table is reference to update the counter. Stack implementation – keep a stack of page numbers in a double link form - most intuitive - but lots of overhead pointer “flipping” in memory: Page referenced: move it to the top – drop the entries above down by one Entry at bottom is least recently used Entry at top is most recently used requires 6 pointers to be changed – in memory! No search for replacement Applied Operating System Concepts

Use Of A Stack to Record The Most Recent Page References
Move address 7 to top drop 2,1,0 down Applied Operating System Concepts

LRU Approximation Algorithms
Reference bit – in page table With each page associate a bit, initially = 0 When page is referenced bit set to 1. Replace the one which is 0 (if one exists). We do not know the order, however. Need scheme for resetting ref bit or they may all go to 1 Use history bits, shift ref bit to hi position (from outside when referenced) & periodically right shift the history bits (padding with 0’s) - replace page with smallest history byte integer – see page 341 … zeros periodically shifted into high position except if referenced, then a one shifted in. If not referenced, value decreases, if referenced value increases. Second chance or Clock replacement algorithm . Need reference bit If a hit: updated even on a hit (set ref bit to 1). If a PF: Scan PT searching for first 0 reference bit If page scanned (in clock order) has reference bit = 1. then: set reference bit 0. leave page in memory. replace first page (in clock order) which has 0 reference bit Applied Operating System Concepts

Second-Chance (clock) Page-Replacement Algorithm

Counting Algorithms Keep a counter of the number of references that have been made to each page. LFU Algorithm: replaces page with smallest count – rationale: an active page should have a large count – maybe this is only an “initial thing” (transient) and it gets “old” but never gets replaced due the the high count – LRU would get rid of the page in this case. MFU Algorithm: replaces page with largest count - based on the argument that the page with the smallest count was probably just brought in and has yet to be used – problem: if it never gets referenced again it hangs around due to the low count. These algorithms not commonly used: costly implementation and effectiveness is too specific to application - doesn’t approximate OPT Applied Operating System Concepts

Allocation of Frames - new topic
Each process needs minimum number of pages. Single user model: process is allowed maximum allocation which generally is smaller that the process size the process then demand pages what it needs if allocation too small, efficiency/performance low inverse trade-off between degree of MP and size of allocation ==> implication on thrashing (see later)… multi-user Architecture requirements: - p. 345 Example: IBM 370 – 6 pages to handle MVC instruction: instruction is 6 bytes, might span 2 pages. 2 pages to handle from, 2 pages to handle to. Inst. restart problematic on page fault-must retain all related pages Minimum of frames determined by architecture, max det. by mem. size Three major allocation schemes - sec for 1st two Non-dynamic allocation – constant throughout life of processes fixed (equal) allocation (uniform frame assignment to all processes) proportional allocation: allocation proportional to size of process -special case of priority allocation – allocation based on priority, which could be size dynamic allocation – the amount of allocation may dynamically vary during life of process. This results from “global” replacement – see below. Applied Operating System Concepts

Fixed (Equal) and proportional Allocation examples
Note that this example is still for non-dynamic allocation – remains constant throughout the life of the process Equal allocation – e.g., if 100 frames and 5 processes, give each 20 pages. Proportional allocation – Allocate according to the size of process. available Example: Applied Operating System Concepts

Global vs. Local Replacement
Local replacement – each process selects from only its own set of allocated frames Frame allocation never changes for a process – non-dynamic allocation – could be fixed or proportional. Paging behavior independent of other processes, but a process can horde frames needed by other processes. Global replacement – process selects a replacement frame from the set of all frames for all processes; one process can take frames from another. This results in a form of dynamic allocation Allocation is variable - may increase at the expense of other processes, or decrease because a higher priority process steals some frames Process cannot control its own fault rate - depends on paging behavior of other processes Question: are frames stolen only from the other processes free frame list, or can a used (loaded with a virtual page) frame be confiscated? Can combine global/local in a priority scheme: first choose from own pool, if this gets depleted, then steal frames from lower priority process. Applied Operating System Concepts

Summary (From Stallings): Allocation or “Resident Size”
Fixed-allocation (non-dynamic allocation) Gives a process a fixed number of pages within which to execute Can be a fixed size for all processes or proportional, but does not change though life of processes when a page fault occurs, one of the pages of that process must be replaced Variable-allocation (dynamic allocation) number of pages allocated to a process varies over the lifetime of the process Applied Operating System Concepts

Allocation and Replacement Policy Summary
Variable Allocation (dynamic in time) and Global replacement Easiest to implement Adopted by many operating systems Operating system keeps list of free frames (global list) Free frame is added to resident set of process when a page fault occurs If no free frame, replaces one from another process- a “working” frame - reducing donar working set by 1 page and increasing recipient by 1 page Fixed Allocation (in time) and Local Replacement When new process added, allocate number of page frames based on application type, program request, or other criteria Number of page frames allocated to a process is fixed throughout life of processes Page to be replaced is chosen from among the frames allocated to the process having the fault Reevaluate allocation from time to time Fixed Allocation (in time) and Global Replacement Not possible (Why?) – resident set fixed – cannot grow. Applied Operating System Concepts

Thrashing In all previous schemes, will a process get enough pages to run efficiently (low fault rate)? If a process does not have “enough” pages, the page-fault rate is very high. This leads to: low CPU utilization. operating system thinks that it needs to increase the degree of multiprogramming - to improve degree of MP. Thus another process added to the system!!! - a vicious cycle leading to: Thrashing: a process is busy swapping pages in and out most of the time - very little time spent on productive work - most time spent doing paging. Applied Operating System Concepts

Thrashing Diagram Why does paging work? Locality model
Process migrates from one locality to another. Localities may overlap. Why does thrashing occur?  size of locality > total allocated memory size High degree of MP results in allocation for a process to get too small and perhaps lose its “working set” – see below Applied Operating System Concepts

Locality In A Memory-Reference Pattern
Principle of locality – empirical verification: spatial: references tend to cluster in contiguous address ranges. Temporal: if a reference is made, It will be likely made again in near future Applied Operating System Concepts

Working-Set Model A scheme to avoid thrashing - give all processes what they need - if not possible limit the number of processes - an allocation scheme.   working-set window  a fixed number of page references Example: the last 10,000 instructions or references WSSi (working set of Process Pi) = total number of pages referenced in the most recent  (varies in time) if  too small will not encompass entire locality. if  too large will encompass several localities. if  =   will encompass entire program. D =  WSSi  total demand frames = total number of allocated frames to keep all processes “happy” if D > m  Thrashing (m = total number of available frames) Policy if D > m, then suspend one of the processes - prevents thrashing by balancing degree of MP with acceptable fault rate. - (move its pages to the disk) … note “swapping” still has a role in virtual memory systems. Applied Operating System Concepts

Working-set model  = 10 references Applied Operating System Concepts

Keeping Track of the Working Set
Approximate with interval timer + a reference bit Example:  = 10,000 Timer interrupts (not a page fault) after every 5000 time units. Maintain a reference bit for each page in memory in the page table. Also keep in memory 2 bits for each page (history bits), copy ref bit to 1st position at 1st interrupt) ref, and to 2nd position on 2nd interrupt (5000th ref), in circular queue fashion – resetting all ref bits to 0 after each copy If one of the history bits is 1, then this page is in working set, ie., was referenced in WS window - keep this page. Why is this not completely accurate? - how recent the reference is uncertain - choosing a “victim” too random if all referenced.. Improvement = 10 bits and interrupt every 1000 time units.- can now have more refined decision criteria to determine working set. Applied Operating System Concepts

Page-Fault Frequency Scheme
Establish “acceptable” page-fault rate. If actual rate too low, process loses frame. If actual rate too high, process gains frame. A direct “feedback” approach to control fault rate – actually a way of maintaining a working set If the fault rate increases and no free frames are available, we may have to suspend a process (swap its pages back to the disk) … note “swapping” still has a role in virtual memory. Applied Operating System Concepts

Other Considerations Prepaging
some aspect of swapping - bring in all pages or a lot of swapped out pages rather than pure demand paging - more efficient, less arm movement on disk Example: if a process had been “swapped out” due I/O wait or the lack of free frames, then “remember” its working set and bring in the entire working set when it is again swapped in Page size selection - a “tuning” situation – sect Internal fragmentation – favors small table size – favors large I/O overhead – favors large – less seeks – pages contiguous on disk Locality – favors small – little “dead wood” in a page – bring in just what you need Provide Multiple Page Sizes. This allows applications that require larger page sizes the opportunity to use them without an increase in fragmentation Applied Operating System Concepts

Other Considerations (Cont.)
TLB Reach - The amount of memory accessible from the TLB. TLB Reach = (TLB Size) X (Page Size) Ideally, the working set of each process is stored in the TLB (completely referenced from the TLB). Otherwise there is a high degree address resolution in the page table which will degrade the effective access time. To put it simply: beef up the TLB and increase performance, but it may be costly – done in hardware. Can increase TLB reach also by increasing the page size – will also cut down on PT size –but this also has a dark side: you will drag along more dead wood (not part of working set) in a page. Applied Operating System Concepts

Inverted Page table (See chapter 9 on how it works) Inverted PT does not have information about the location of a missing page (location on the disk) which is needed in the event of a page fault. A regular PT would have this information – how come there is a problem with the inverted PT? … good test question! For a non virtual memory system, where entire processes is always swapped this page info is not needed. One solution is to have an external Page table on the disk which gets accessed only during a page fault – it would contain the logical address information … complicates the picture, because when it is paged in, it could trigger more process page faults! Applied Operating System Concepts

Although VM is supposed to be transparent to the user, taking care in defining data structures or programming structures can improve locality – this ideally can be made a compiler task. Smart compilers to efficiently organize object data/code being aware of paging. Program structure int A[][] = new int[1024][1024]; Each row is stored in one page Program 1 for ( j = 0; j < A.length; j++) for (i = 0; i < A.length; i++) A[i,j] = 0; 1024 x 1024 page faults – indexing is row to row Program 2 for (i = 0; i < A.length; i++) for ( j = 0; j < A.length; j++) A[i,j] = 0; 1024 page faults - indexing is column to column Applied Operating System Concepts

I/O Interlock – Pages must sometimes be locked into memory (sometimes called “pinned”). Consider I/O. Pages that are used for copying a file from a device must be locked from being selected for eviction by a page replacement algorithm (buffering). – Why? – good test question! Alternate solution: use system memory – why? Use a “local” allocation/replacement scheme – why? Applied Operating System Concepts

Reason Why Frames Used For I/O Must Be In Memory

Demand Segmentation Used when insufficient hardware to implement demand paging. OS/2 allocates memory in segments, which it keeps track of through segment descriptors Segment descriptor contains a valid bit to indicate whether the segment is currently in memory. If segment is in main memory, access continues, If not in memory, segment fault. Applied Operating System Concepts

Chapter 11: File-System Interface
File Concept Access Methods Directory Structure File System Mounting File Sharing Protection Annotated for use with Silberschatz’s book used during CS350, Fall 03 Instructor's annotation in Blue Last updated 4/25/03, 11/30/03 Operating System Concepts

File Concept Type or extension usually indicates meaning of contents:
Collection of related data Contiguous logical address space – mapped onto physical device by OS, but presents a view (interface) to an application which is independent of any device usually non-volatile Users view: smallest allotment of logical secondary storage. From a user viewpoint: data cannot be written to secondary storage unless it is in a file Sequence of bytes, bits, lines, or records … file has a structure defined by file type … interpretation depends on OS and/or application Type or extension usually indicates meaning of contents: Data numeric character, ex: rat.txt - text file binary, ex: rat.xls - spread sheet, application supported Program, ex: rat.exe - executable module, OS supported Operating System Concepts

File Structure The internal representation of the data
None - sequence of words, bytes Simple record structure Lines Fixed length Variable length Complex Structures Formatted document Relocatable load file Can simulate last two with first method by inserting appropriate control characters - ex: MS Word documents. Who decides (you mean support? - see previous slide): Operating system Program Operating System Concepts

File Attributes Name – only information kept in human-readable form.
Identifier - a machine readable “field” corresponding to the name - typically a number Type – needed for systems that support different types. Location – pointer to file location on device. Size – current file size. Protection – controls who can do reading, writing, executing. Time, date, and user identification – data for protection, security, and usage monitoring. Information about files are kept in the directory structure, which is maintained on the disk. Operating System Concepts

File Operations File is an abstract data type - operation are associated with file type, p. 373 Operations (usually from within a program, ie., function calls): Create Write Read Reposition within file – file seek Delete Truncate Open(Fi) – search the directory structure on disk for entry Fi, and move the content of entry to memory - to the open file table, attributes of file is pointed to by a file descriptor or “handle” returned by open call. The open file table is not searched it is directly accessed via the file descriptor. Open file table is like a cache for the directory. Close (Fi) – move the content of entry Fi in memory to directory structure on disk (if modified?). Operating System Concepts

File Types – Name, Extension
Big questions: Who supports file extensions (types)? … support is shared by OS and the application … varies according to the OS Also who supports the file structure (internal format, logical records, etc: OS or application? OS minimally supports executable files. UNIX provides only minimal OS support. Internal file structure: Mapping the logical record size to the physical block (sector) is the problem here. See sect Operating System Concepts

Access Methods Sequential Access - Tape model: must sequentially move thru file - … no insertion of records, append at end - can read anywhere by searching from beginning for record - Current position (cp) pointer indicates next record to be accessed - write append to end of file read next , cp++ write next , cp++ reset , cp = 0; no read after last write (beyond EOF) Direct Access - Disk model: random access of fixed blocks - File is seen as numbered series of fixed length logical blocks or records that allow programs to read/write records in no particular order. - R/W records in no particular order read n write n position to n read next //simulation of sequential write next rewrite n n = relative block number Operating System Concepts

Sequential-access File

Simulation of Sequential Access on a Direct-access File … simulation of direct access on sequential more difficult Operating System Concepts

Example of Index access methods and Relative Files
- Build on top of direct access - must search index and then directly access via disk pointer (or convert relative block number to a physical disk address) - index written on disk and copied to memory (cache ) - can have multilevel index tables, index the index table - sound familiar? - Index will get you to a physical block containing the logical record (relatively small logical records packed in a larger physical block (ex: sector) - FAT access method (used on PC) related to this idea, and ISAM on IBM mainframes. Operating System Concepts

Directory Structure A collection of nodes containing information about all files. A way of locating files on the disk- by name IBM mainframe buzz word: Volume Table of Contents (VTOC) Contains access information about the file such as name, location, size, type, etc. Example: do ls -l -a on UNIX Directory This example shows the one minimal feature of a directory, namely relating file names to their locations. Files F 1 F 2 F 3 F 4 F n Both the directory structure and the files reside on disk. Backups of these two structures are kept on tapes. Operating System Concepts

A Typical File-system Organization
Organize disk into partitions (minidisks as per IBM mainframes) - Each with its own directory - files and directories stored in partitions - Partitions appear as a logical disk to user - may be larger that a disk for multiple disk systems (Ex: RAID) - Ideally user is oblivious to partition management and sees only disk drives. Typical for PC’s Example for a RAID system Operating System Concepts

Information in a Device (say disk) Directory
Name Type Address … pointer to, or indicator of physical location of file – generally not visible to the user – the main guts of the entry. Current length Maximum length Date last accessed (for archival) Date last updated (for dump) Owner ID (who pays) Protection information (discuss later) Check this stuff on UNIX doing ls -l -a Must minimally have name and address Operating System Concepts

Operations Performed on Directory
Search for a file Create a file Delete a file List a directory Rename a file Traverse the file system Operating System Concepts

Organize the Directory (Logically) to Obtain
Efficiency – locating a file quickly. Naming – convenient to users. Two users can have same name for different files. The same file can have several different names. Grouping – logical grouping of files by properties, (e.g., all Java programs, all games, …) Operating System Concepts

Single-Level Directory- some history
Directories originally set up for multi-user systems A single directory for all users. Cat,bo, … are file names <--Actual data for files Naming problem Just a list of all user’s files: names must be unique - problem with duplicate names Grouping problem – difficult to organize files Operating System Concepts

Two-Level Directory-some modern history
Separate directory for each user - solves duplicate name problem. Example: IBM VM system - each user (on a different virtual machine) had a single level directory Path name Can have the same file name for different user Efficient searching No grouping capability Operating System Concepts

Tree-Structured Directories - what we have today
Does it all: introduces the concept of a “folder” - can have “folders” within “folders” (folders is a Windows buzz word) names in the directory entry are file names OR the names of other directories - the idea is from UNIX Operating System Concepts

Tree-Structured Directories (Cont.)
Efficient searching -can quickly skip to desired subdirectory before searching for data files Grouping Capability – easy to organize files Current directory (working directory) cd /spell/mail/prog type list Operating System Concepts

Tree-Structured Directories (Cont.)
Absolute (from root) or relative (from current directory) path name suppose your home directory on UNIX is: /u0/users/0/cs350, and inder cs350 you have a subdirctory project containing an executable rw. Executing rw from cs350 can be done as: /u0/users/0/cs350/project/rw - specify absolute path or . /project/rw - use relative path (“.” means current dirctory) Creating a new file is done in current directory. Delete a file rm <file-name> Creating a new subdirectory is done in current directory. mkdir <dir-name> Example: if in current directory /mail mkdir count mail prog copy prt exp count Deleting “mail”  deleting the entire subtree rooted by “mail”. In UNIX rm -rf mail … recursively clears all subdirectories - a dangerous command! Operating System Concepts

Acyclic-Graph Directories
Have shared subdirectories and files <== key idea. In UNIX this is the symbolic link - powerful method for sharing files among large groups of users over a network. Uses path name to shared file. Can have distinct different file names for same file – alias Must ensure that there are no cycles in the graph Operating System Concepts

Acyclic-Graph Directories (Cont.)
Shared file may have two different names (aliasing) When could a shared file be deleted? If dict deletes list  dangling pointer (see prev. slide). Solutions: Backpointers, so we can delete all pointers. Variable size records a problem. Backpointers using a daisy chain organization. Entry-hold-count solution. - remove file only when all references to it are gone - increment/decrement count. Two problems to avoid in acyclic graphs: dangling pointers - UNIX: if shared file deleted, leave all symbolic links - leave it up to the user to handles these links avoid cycles when adding links - cycles detection. Operating System Concepts

General Graph Directory >>>omit<<<

General Graph Directory (Cont.) >>>omit<<<
How do we guarantee no cycles? Allow only links to file not subdirectories. Garbage collection. Every time a new link is added use a cycle detection algorithm to determine whether it is OK. Operating System Concepts

File System Mounting Mounting: Associating data (or file) on a device to a directory to a directory/file system in user space. Make “raw” files on a device part of a directory tree. A file system must be mounted before it can be accessed. The directory structure can be built out of multiple partitions/disks (or from multiple devices, or remote networked machines), which must be “mounted” to make them available within the file system name space A unmounted file system is mounted at a mount point within the directory tree. Operating System Concepts

Existing. (b) Unmounted Partition

Mount Point New FS mounted over previous directory
subtree at “users” and would obscure previous tree. Better create another subdirectory under users to be the mount point. See pp Operating System Concepts

File Sharing Sharing of files on multi-user systems is desirable.
Sharing may be done through a protection scheme. On distributed systems, files may be shared across a network. Network File System (NFS) is a common distributed file-sharing method. On UNIX - remote file systems would be mounted Operating System Concepts

Protection File owner/creator should be able to control:
what can be done by whom Types of access Read Write Execute Append Delete List Operating System Concepts

Access Lists and Groups
Mode of access: read, write, execute Three classes of users RWX a) owner access 7  RWX b) group access 6  1 1 0 c) public access 1  0 0 1 Ask manager to create a group (unique name), say G, and add some users to the group. For a particular file (say game) or subdirectory, define an appropriate access. owner group public chmod 761 game Attach a group to a file chgrp G game Operating System Concepts

Chapter 12: File System Implementation
File System Structure File System Implementation Directory Implementation Allocation Methods Free-Space Management Efficiency and Performance Recovery Log-Structured File Systems - omit for now NFS - omit for now Annotated for use with Silberschatz’s book used during CS350, Fall 03 Instructor's annotation in blue Last updated 12/4/03+ Operating System Concepts

File-System Structure
File structure Logical storage unit Collection of related information File system resides on secondary storage (disks). Design problem: How should the file system look to the user … attributes, operations, directory structure Creating algorithms and data structures to map the logical file system to physical secondary storage device. File system organized into layers. File control block (FCB) – storage structure consisting of information about a file – stored on disk, copied to memory for use. Operating System Concepts

Layered File System Manages file attribute/control info
Manages directory structure: symbolic file names to logical block#, Uses File Control Block (FCB) Logical block# in  physical block# out, (knows location of file and allocation scheme used) Generic cmds to dev drivers using disk addresses: cylinder, track, sector Device drivers: interfaces to hardware/controllers, gets generic commands. Hardware controllers Operating System Concepts

A Typical File Control Block
Location on disk Operating System Concepts

Overview On-disk structures In-Memory structures
Boot control block – 1st block of partition - used for booting if this is a bootable partition. May also have information on another partition which may be where booting may start (the Master Boot Record, MBR). Partition control block- attributes of partition, FCB pointers, number blocks in partition, free block count, free block pointers, etc. Directory structure The FCB’s - (inode in UNIX) In-Memory structures In memory copy of partition table Cache of recently accesses directories System-wide open file table copy of FCBs for all open files +more Per-process open file table (pointers to system-wide open file table) – accessed via file descriptor returned from open call – reduces search time Operating System Concepts

In-Memory File System Structures
The following example (fig. 12.3) illustrates the necessary file system structures provided by the operating systems See Sect for scenario description opening a file reading a file Operating System Concepts

Scenario for file operations see p. 415 of text
Creating a file Allocate a new FCB Read appropriate directory into memory, and add the new file name and FCB to it Write it back to disk Opening a file Directory structure searched – parts cached un memory FCB copied to System-wide-open file table File descriptor returned which indexes into the per-process-open file table. Reading a file Using file descriptor, access the per processes open file table Which in turn gets the FCB from system wide open file table FCB has information needed to access the data in the file Operating System Concepts

Directory Implementation
Must search director for a required file Linear list of file names with pointer to the data blocks. simple to program time-consuming to execute Hash Table – linear list with hash data structure. decreases directory search time collisions – situations where two file names hash to the same location fixed size Operating System Concepts

File system allocation (new slide)
Many files of variable size stored on the same disk Main problem is how to allocate disk blocks for these files so that disk space is utilized effectively and files can be accessed quickly. Three major allocations methods: contiguous Linked indexed Operating System Concepts

Contiguous Allocation
Each file occupies a set of contiguous blocks on the disk. Minimal head movement – high performance (minimal head movement) - used by IBM VM system Simple – only starting location (block #) and length (number of blocks) are required. Random access – for file stating at block b, a block in that file at i blocks into the file would be at block b+i Good for sequential access also Wasteful of space– external fragmentation – same problem as in dynamic contiguous memory management (ch 9) – with same solutions. Files cannot grow – if we over-allocate then internal fragmentation. Difficult to estimate how much space is needed for file Modified contiguous allocation – give “extents” to make file grow – just postpones the original problem Operating System Concepts

Contiguous Allocation of Disk Space
Start pointes to 1st block In file Length is the number of blocks in file Operating System Concepts

Extent-Based Systems Many newer file systems (I.e. Veritas File System) use a modified contiguous allocation scheme. Extent-based file systems allocate disk blocks in extents. An extent is a contiguous block of disks. Extents are allocated for file allocation. A file consists of one or more extents. –solves growth problem, but just postpones the original problem- what should be the size of the extent: too large – internal fragmentation, too small: external fragmentation. Operating System Concepts

Linked Allocation Each file is a linked list of disk blocks: blocks may be scattered anywhere on the disk. Pointer is to next block – file is treated like a linked list Sequentially follow pointers to access a given block pointer block = Operating System Concepts

Linked Allocation (Cont.)
Simple – need only starting address Free-space management system – no waste of space No random access File-allocation table (FAT) – disk-space allocation used by MS-DOS and OS/2. – Combines linked with indexed (see later) Incorrect block location calculations removed Operating System Concepts

Linked Allocation (Cont.)
Solves external, internal (minimal) fragmentation problem, and growth problem – only limit is free space pool (as paging did in memory management, but no random access) Only sequential access –direct access support poor (must be “simulated” ) Access is slow and variable because of having to read all pointers in blocks before the one desired. Each block has wasted space for pointer – minimized by clustering blocks – but this may result in increased internal fragmentation Reliability a problem: lost or damaged pointers Operating System Concepts

Linked Allocation Operating System Concepts

Indexed Allocation Brings all pointers together into the index block.
One “logical” index block per file … in general may have multiple linked index blocks – see later. Logical view. index table Solves internal, external fragmentation, and growth problem  In addition allows direct/random access In simplest form, size of index block determines number of blocks in file If index block too large, then wasted space for small files. If index block not large enough then need to link index blocks together – see later. Operating System Concepts

Example of Indexed Allocation

Indexed Allocation (Cont.)
Need index table (ie., block) Random access Dynamic access without external fragmentation, but have overhead of index block. Index block permanently stored on disk, but may be cached in memory – to further improve disk performce. If cached, we have the old problem of keeping the disk updated – especially if we have a crash. Incorrect block location calculations removed Operating System Concepts

Indexed Allocation – Mapping (Cont.) (new slide)
Problem is when the file so large that there are not enough entries in a single index block. Want to map from logical to physical in a file of unbounded length Use a scheme in which the index blocks are themselves linked: Link blocks of index table (no limit on size). The last entry of a linked block of an index table is a pointer to the next index table Now no logical limit on the size of a file Multilevel index block is another approach to this problem – see below: Can combine linked index block idea with multi-level: UNIX I-node scheme –see below Operating System Concepts

Indexed Allocation – Mapping (Cont
Indexed Allocation – Mapping (Cont.) Multi-level Index blocks - See text  outer-index index table file Operating System Concepts

File-Allocation Table
Used for windows/OS/2 PC systems Combines linked and indexed concepts ===> See Instructor’s notes: “Comments on File System Implementation” For detailed description – available on CS350 website Operating System Concepts

Combined Scheme: UNIX (4K bytes per block) See text
The UNIX Inode Performance tradeoff favors small files Operating System Concepts

Free-Space Management
See instructors notes: “Comments on File System Implementation” from the web site. Bit vector (n blocks) 1 2 n-1 … 0  block[i] free 1  block[i] occupied bit[i] =  A bit for every block on disk – offset into map is block number Block number calculation (moving from left to right): (number of bits per word) * (number of 0-value words) + offset of first 1 bit in 1st nonzero word Note: “number of 0-value words” = # of leading all zero words Operating System Concepts

Free-Space Management (Cont.)
Bit map requires extra space. Example: block size = 212 bytes - 4K page (common) disk size = 230 bytes (1 gigabyte) - on the small side n = 230/212 = 218 bits (or 32K bytes) a 1.3 GB with 512byte page has bit map of 332KB - stresses memory if cached But easy to get contiguous files Linked list (free list) Link together all free disk blocks. Keep pointer to 1st free block in a special location & cache it Cannot get contiguous space easily No waste of space Grouping – modification of free-list approach – an indexed approach 1st block in partition (an index block) contains pointers to n free data blocks Last block in the above n free blocks is another index block pointing to another n free blocks with the last of these being another pointer block, etc. Similar the the linked index block approach for file allocation. Advantage: the addresses of a large number of free blocks can be found quickly Counting Keep track of groups of contiguous blocks. Just need address of first block in group & number of free contiguous blocks following – similar to extents Shorter index list Operating System Concepts

Free-Space Management (Cont.)
Need to protect: Pointer to free list Bit map Must be kept on disk Copy in memory and disk may differ. Cannot allow for block[i] to have a situation where bit[i] = 1 in memory and bit[i] = 0 on disk. Solution - keep memory copy & disk copy in synch: Set bit[i] = 1 in disk. Allocate block[i] Set bit[i] = 1 in memory Operating System Concepts

Linked Free Space List on Disk
Free (unused) blocks physically Linked together. Pointer to 1st block may be cached in Memory The FAT method has free space Management built into the access scheme. The text unjustifiably criticizes this method as inefficient due to the need for transverse the list. We should never have to transverse the list. As long as we have a pointer to the first block in the list, we need only to access the first block: Take the first block if you need a new block, or add a free block the head of the list. Operating System Concepts

Efficiency and Performance
Efficiency dependent on: disk allocation and directory algorithms types of data kept in file’s directory entry - maintaining “statistics” in the directory - see p. 390? UNIX: vary cluster size as file grows (minimizes internal fragmentation) UNIX: Pre-allocates inodes on a disk to distribute this structure, wastes some space but improves performance - see below Performance disk cache – separate section of main memory for frequently used blocks free-behind and read-ahead – techniques to optimize sequential access improve PC performance by dedicating section of memory as virtual disk, or RAM disk. History: was frequently done on PCs when all you had was a floppy. UNIX: distribute inodes across partition - keep data blocks near files inode - minimize seeks Operating System Concepts

Various Disk-Caching Locations
Note the difference between a RAM disk and disk cache: Contents of RAM disk under user control - same as a real disk Contents of disk cache under control of OS cache disk referenced blocks on assumption that they will be needed again (temporal locality) Use I/O protocol for disk cache Disk Controller Track buffer: page 434 Operating System Concepts

Page Cache A page cache caches pages rather than disk blocks using virtual memory techniques. More efficient that caching disk blocks (disk cache) - higher performance Use memory management protocol for page cache – uses memory calls or instructions. Memory-mapped I/O uses a page cache. Routine I/O (I/O reads and writes) through the file system uses the buffer (disk) cache. Problem: what if you want to do memory mapped “paged” caching if the underlying architecture uses ordinary I/O? This leads to the following figures. Operating System Concepts

I/O Without a Unified Buffer Cache
Page caching mapped onto buffer caching Because the virtual Memory cannot Interface with the Buffer cache, the Data in buffer cache Is copied to page Cache – double Caching - inefficient Virtual Memory Interface: use memory references Fig p. 435 pages Disk blocks File I/O interface Use read/write calls Operating System Concepts

I/O Using a Unified Buffer Cache
A unified buffer cache uses the same page cache to cache both memory-mapped pages and ordinary file system I/O: Only a single Buffer needed – More efficient. page cache? Operating System Concepts

Recovery “lazy” disk update may be used - can leave the disk in an inconsistent state if a crash ocurrs before a disk update is made. Consistency checking – compares data in directory structure with data blocks on disk, and tries to fix inconsistencies - a scandisk function? Use system programs to back up data from disk to another storage device (floppy disk, magnetic tape). Recover lost file or disk by restoring data from backup Ex. The Master Boot Record is backed up - better to resore an obsolete version than none at all. Operating System Concepts

Log Structured File Systems <<omit>>
Log structured (or journaling) file systems record each update to the file system as a transaction. All transactions are written to a log. A transaction is considered committed once it is written to the log. However, the file system may not yet be updated. The transactions in the log are asynchronously written to the file system. When the file system is modified, the transaction is removed from the log. If the file system crashes, all remaining transactions in the log must still be performed. Operating System Concepts

Network File System - NFS: Overview
NFS views a set of interconnected machines as having independent file system. Goal: allow sharing of file systems omong machines on the network in a transparent manner. Sharing is not mandatory – it is upon request of a machine. Sharing is based on a client/server relationship. Sharing is allowed between any pair of machines raher than only with a dedicated server. The file mounting protocol is used to make a remote directory accessible in a transparent manner. Operating System Concepts

The Sun Network File System (NFS) Omit NFS slides for now
An implementation and a specification of a software system for accessing remote files across LANs (or WANs). The implementation is part of the Solaris and SunOS operating systems running on Sun workstations using an unreliable datagram protocol (UDP/IP protocol and Ethernet. Operating System Concepts

NFS (Cont.) Omit NFS slides for now
Interconnected workstations viewed as a set of independent machines with independent file systems, which allows sharing among these file systems in a transparent manner. A remote directory is mounted over a local file system directory. The mounted directory looks like an integral subtree of the local file system, replacing the subtree descending from the local directory. Specification of the remote directory for the mount operation is nontransparent; the host name of the remote directory has to be provided. Files in the remote directory can then be accessed in a transparent manner. Subject to access-rights accreditation, potentially any file system (or directory within a file system), can be mounted remotely on top of any local directory. Operating System Concepts

NFS (Cont.) Omit NFS slides for now
NFS is designed to operate in a heterogeneous environment of different machines, operating systems, and network architectures; the NFS specifications independent of these media. This independence is achieved through the use of RPC primitives built on top of an External Data Representation (XDR) protocol used between two implementation-independent interfaces. The NFS specification distinguishes between the services provided by a mount mechanism and the actual remote-file-access services. Operating System Concepts

Three Independent File Systems Omit NFS slides for now

Mounting in NFS Omit NFS slides for now
Mounts Cascading mounts Operating System Concepts

NFS Mount Protocol Omit NFS slides for now
Establishes initial logical connection between server and client. Mount operation includes name of remote directory to be mounted and name of server machine storing it. Mount request is mapped to corresponding RPC and forwarded to mount server running on server machine. Export list – specifies local file systems that server exports for mounting, along with names of machines that are permitted to mount them. Following a mount request that conforms to its export list, the server returns a file handle—a key for further accesses. File handle – a file-system identifier, and an inode number to identify the mounted directory within the exported file system. The mount operation changes only the user’s view and does not affect the server side. Operating System Concepts

NFS Protocol Omit NFS slides for now
Provides a set of remote procedure calls for remote file operations. The procedures support the following operations: searching for a file within a directory reading a set of directory entries manipulating links and directories accessing file attributes reading and writing files NFS servers are stateless; each request has to provide a full set of arguments. Modified data must be committed to the server’s disk before results are returned to the client (lose advantages of caching). The NFS protocol does not provide concurrency-control mechanisms. Operating System Concepts

Three Major Layers of NFS Architecture Omit NFS slides for now
UNIX file-system interface (based on the open, read, write, and close calls, and file descriptors). Virtual File System (VFS) layer – distinguishes local files from remote ones, and local files are further distinguished according to their file-system types. The VFS activates file-system-specific operations to handle local requests according to their file-system types. Calls the NFS protocol procedures for remote requests. NFS service layer – bottom layer of the architecture; implements the NFS protocol. Operating System Concepts

Schematic View of NFS Architecture Omit NFS slides for now

NFS Path-Name Translation Omit NFS slides for now
Performed by breaking the path into component names and performing a separate NFS lookup call for every pair of component name and directory vnode. To make lookup faster, a directory name lookup cache on the client’s side holds the vnodes for remote directory names. Operating System Concepts

NFS Remote Operations Omit NFS slides for now
Nearly one-to-one correspondence between regular UNIX system calls and the NFS protocol RPCs (except opening and closing files). NFS adheres to the remote-service paradigm, but employs buffering and caching techniques for the sake of performance. File-blocks cache – when a file is opened, the kernel checks with the remote server whether to fetch or revalidate the cached attributes. Cached file blocks are used only if the corresponding cached attributes are up to date. File-attribute cache – the attribute cache is updated whenever new attributes arrive from the server. Clients do not free delayed-write blocks until the server confirms that the data have been written to disk. Operating System Concepts

Chapter 13: I/O Systems- 6th ed
I/O Hardware Application I/O Interface Kernel I/O Subsystem Transforming I/O Requests to Hardware Operations Streams Performance Review Chapters 2 and 3, and instructors notes on: “Interrupt schemes and DMA” This chapter gives more focus to these chapters and topics. Instructor’s annotations in blue Updated 12/5/03 Operating System Concepts

I/O Hardware Incredible variety of I/O devices Common concepts
Port - basic interface to CPU - status, control, data Bus (daisy chain or shared direct access) - main and specialized local (ex: PCI for main and SCSI for disks) Controller (host adapter) - HW interface between Device and Bus - an adapter card or mother board module Controller has special purposes registers (commands, etc.) which when written to causes actions to take place - may be memory mapped I/O instructions control devices - ex: in, out for Intel Devices have addresses, used by Direct I/O instructions - uses I/O instructions Memory-mapped I/O - uses memory instructions Operating System Concepts

A Typical PC Bus Structure

Device I/O Port Locations on PCs (partial)
Various ranges for a device includes both control and data ports Operating System Concepts

Polling Handshaking Determines state of device
command-ready busy Error Busy-wait cycle to wait for I/O from device When not busy - set data in data port, set command in control port and let ‘er rip Not desirable if excessive - since it is a busy wait which ties up CPU & interferes with productive work Remember CS220 LABs Operating System Concepts

Interrupts CPU Interrupt request line (IRQ) triggered by I/O device
Interrupt handler receives interrupts Maskable to ignore or delay some interrupts Interrupt vector to dispatch interrupt to correct handler Based on priority Some unmaskable Interrupt mechanism also used for exceptions Application can go away after I/O request, but is til responsible for transferring data to memory when it becomes available from the device. Can have “nested” interrupts (with Priorities) See Instructors notes: “Use of Interrupts and DMA” Soft interrupts or “traps” generated from OS in system calls. Operating System Concepts

Interrupt-Driven I/O Cycle
Go away & do Something else ==> Operating System Concepts

Intel Pentium Processor Event-Vector Table
Interrupts 0-31 are non-maskable - cannot be disabled Operating System Concepts

Direct Memory Access With pure interrupt scheme, CPU was still responsible for transferring data from controller to memory (on interrupt) when device mad it available. Now DMA will do this - all CPU has to do is set up DMA and user the data when the DMA-complete interrupt arrives. … Interrupts still used - but only to signal DMA Complete. Used to avoid programmed I/O for large data movement Requires DMA controller Bypasses CPU to transfer data directly between I/O device and memory Cycle stealing: interference with CPU memory instructions during DMA transfer. - DMA takes priority - CPU pauses on memory part of word. Operating System Concepts

Six Step Process to Perform DMA Transfer

Application I/O Interface
The OS software interface to the I/O devices (an API to the programmer) Attempts to abstract the characteristics of the many I/o devices into a few general classes. I/O “system calls” encapsulate device behaviors in generic classes Device-driver layer hides differences among I/O controllers from kernel Devices vary in many dimensions Character-stream or block units for data transfer bytes vs blocks Sequential or random-access - access methods Synchronous (predictable response times) vs asynchronous (unpredictable response times) Sharable or dedicated - implications on deadlock Speed of operation - device/software issue read-write, read only, or write only - permissions Operating System Concepts

A Kernel I/O Structure Fig. 13.6 System calls ==>
… “user” API ==> Example: ioctl(…) generic call (roll your own) in UNIX (p. 468), and other more specific commands or calls open, read, ... Fig. 13.6 Operating System Concepts

Characteristics of I/O Devices
Device driver must deal with these at a low level Use of I/O buffering Operating System Concepts

Block and Character Devices
Block devices include disk drives example sectors or sector clusters on a disk Commands/calls include read, write, seek Access is typically through a file-system interface Raw I/O or file-system access - “binary xfr” of file data - interpretation is in application (personality of file lost) Memory-mapped (to VM) file access possible - use memory instructions rather than I/O instructions - very efficient (ex: swap space for disk). Device driver xfr’s blocks at a time - as in paging DMA transfer is block oriented Character devices include keyboards, mice, serial ports Device driver xfr’s byte at a time Commands include get, put - character at a time Libraries layered on top allow line editing - ex: keyboard input could be beefed up to use a line at a time (buffering) Block & character devices also determine the two general device driver catagories Operating System Concepts

Network Devices Varying enough from block and character to have own interface - OS makes network device interface distinct from disk interface - due to significant differences between the two Unix and Windows NT/9i/2000 include socket interface Separates network protocol from network operation Encapsulates details of various network devices for application … analogous to a file and the disk??? Includes select functionality - used to manage and access sockets - returns info on packets waiting or ability to accept packets - avoids polling Approaches vary widely (pipes, FIFOs, streams, queues, mailboxes) … you saw some of these! Operating System Concepts

Clocks and Timers Provide current time, elapsed time, timer
If programmable, interval time used for timings, periodic interrupts ioctl (on UNIX) covers odd aspects of I/O such as clocks and timers - a back door for device driver writers (roll your own). Can implement “secret” calls which may not be documented in a users or programming manual Operating System Concepts

Blocking and Nonblocking I/O
Blocking - process (making the request blocks - lets other process execute) suspended until I/O completed Easy to use and understand Insufficient for some needs multi-threading - depends on role of OS in thread management Nonblocking - I/O call returns as much as available User interface, data copy (buffered I/O) Implemented via multi-threading Returns quickly with count of bytes read or written - ex: read a “small” portion of a file very quickly, use it, and go back for more, ex: displaying video “continuously from a disk” Asynchronous - process (making the asynch request) runs while I/O executes Difficult to use - can it continue without the results of the I/O? I/O subsystem signals process when I/O completed - via interrupt (soft), or setting of shared variable which is periodically tasted. Operating System Concepts

Kernel I/O Subsystem See A Kernel I/O Structure slide - Fig 13.6
Scheduling Some I/O request ordering via per-device queue Some OSs try fairness Buffering - store data in memory while transferring between devices To cope with device speed mismatch - de-couples application from device action To cope with device transfer size mismatch To maintain “copy semantics” - guarantee that the version of data written to device from a buffer is identical to that which was there at the time of the “write call” - even if on return of the system call, the user modifies buffer - OS copies data to kernel buffer before returning control to user. Double or “ping-pong” buffers - write in one and read from another - decouples devices and applications … idea can be extended to multiple buffers accesses in a circular fashion Operating System Concepts

Sun Enterprise 6000 Device-Transfer Rates

Kernel I/O Subsystem - (continued)
Caching - fast memory holding copy of data Always just a copy Key to performance How does this differ from a buffer? Spooling - a buffer holding output/(input too) for a device If device can serve only one request at a time Avoids queuing applications making requests. Data from an application is saved in a unique file associated with the application AND the particular request. Could be saved in files on a disk, or in memory. Example: Printing Device reservation - provides exclusive access to a device System calls for allocation and deallocation Watch out for deadlock - why? Operating System Concepts

Error Handling OS can recover from disk read, device unavailable, transient write failures Most return an error number or code when I/O request fails System error logs hold problem reports CRC checks - especially over network transfers of a lot of data, for example video in real time. Operating System Concepts

Kernel Data Structures
Kernel keeps state info for I/O components, including open file tables, network connections, character device state used by device drivers in manipulating devices and data transfer, and in for error recovery data that has images on the disk must be kept in synch with disk copy. Many, many complex data structures to track buffers, memory allocation, “dirty” blocks Some use object-oriented methods and message passing to implement I/O Make data structures object oriented classes to encapsulate the low level nature of the “device” - UNIX provides a seamless interface such as this. Operating System Concepts

UNIX I/O Kernel Data Structure
Refer to chapter 11 and 12 on files Fig. 13.9 Operating System Concepts

Mapping I/O Requests to Hardware Operations
Consider reading a file from disk for a process: How is connection made from file-name to disk controller: Determine device holding file Translate name to device representation Physically read data from disk into buffer Make data available to requesting process Return control to process See the 10 step scenario on pp (Silberschatz, 6th ed.) for a clear description. Operating System Concepts

Life Cycle of An I/O Request
Data already in buffer Ex read ahead Operating System Concepts

STREAMS (?) STREAM – a full-duplex communication channel between a user-level process and a device A STREAM consists of: - STREAM head interfaces with the user process - driver end interfaces with the device - zero or more STREAM modules between them. Each module contains a read queue and a write queue Message passing is used to communicate between queues Operating System Concepts

The STREAMS Structure Operating System Concepts

Performance sect 13.7 I/O a major factor in system performance:
Places demands on CPU to execute device driver, kernel I/O code resulting in context switching interrupt overhead Data copying - loads down memory bus Network traffic especially stressful See bulleted list on page 485 (Silberschatz, 6th ed.) Improving Performance See bulleted list on page 485 (Silberschatz, 6th ed.) Reduce number of context switches Reduce data copying Reduce interrupts by using large transfers, smart controllers, polling Use DMA Move proccessing primitives to hardware Balance CPU, memory, bus, and I/O performance for highest throughput Operating System Concepts

Intercomputer Communications- omit for now

Device-Functionality Progression
Where should I/O functionality be implemented? Application level … device hardware Decision depends on trade-offs in the design layers: Operating System Concepts

Module 13: Secondary-Storage Chapter 14 – Silberschatz – 6th ed.
Annotations in Blue by instructor Last updated 5/10/02 ==> updated 12/10/03 Disk Structure Disk Scheduling Disk Management Swap-Space Management <<< coverage only to here Disk Reliability Stable-Storage Implementation Tertiary Storage Devices Operating System Issues Performance Issues Applied Operating System Concepts

Disk Structure Disk drives are addressed as large 1-dimensional arrays of logical blocks, where the logical block is the smallest unit of transfer. The 1-dimensional array of logical blocks is mapped into the sectors of the disk sequentially. Sector 0 is the first sector of the first track on the outermost cylinder. Mapping proceeds in order through that track, then the rest of the tracks in that cylinder, and then through the rest of the cylinders from outermost to innermost. Applied Operating System Concepts

Disk Scheduling The operating system is responsible for using hardware efficiently — for the disk drives, this means having a fast access time and disk bandwidth. Access time has two major components Seek time is the time for the disk are to move the heads to the cylinder containing the desired sector. Rotational latency is the additional time waiting for the disk to rotate the desired sector to the disk head. Minimize seek time Seek time  seek distance Disk bandwidth is the total number of bytes transferred, divided by the total time between the first request for service and the completion of the last transfer. Applied Operating System Concepts

Disk Scheduling (Cont.)
Several algorithms exist to schedule the servicing of disk I/O requests. Definition: a disk request is for data on a per cylinder basis. If a request is for data on two different tracks in the same cylinder, then it would be considered as two requests to the same cylinder. …. Analogy to byte references within a page in memory management. We illustrate them with a request queue (0-199) of cylinder references: 98, 183, 37, 122, 14, 124, 65, 67 Head pointer 53 Applied Operating System Concepts

Illustration shows total head movement of 640 cylinders.
FCFS Illustration shows total head movement of 640 cylinders. head movement of 640 cylinders Applied Operating System Concepts

SSTF Selects the request with the minimum seek time from the current head position. SSTF scheduling is a form of SJF scheduling; may cause starvation of some requests. Illustration shows total head movement of 236 cylinders. Applied Operating System Concepts

SSTF (Cont.) head movement of 236 cylinders

SCAN The disk arm starts at one end of the disk, and moves toward the other end, servicing requests until it gets to the other end of the disk, where the head movement is reversed and servicing continues. Sometimes called the elevator algorithm or bus route algorithm. Illustration shows total head movement of 208 (?) cylinders. Applied Operating System Concepts

SCAN (Cont.) head movement of 208? cylinders

C-SCAN Provides a more uniform wait time than SCAN.
The head moves from one end of the disk to the other. servicing requests as it goes. When it reaches the other end, however, it immediately returns to the beginning of the disk, without servicing any requests on the return trip. Treats the cylinders as a circular list that wraps around from the last cylinder to the first one. Perhaps a fast reset of the arm doesn’t waste time checking disk regions that just had been serviced Applied Operating System Concepts

C-SCAN (Cont.) Applied Operating System Concepts

C-LOOK Version of C-SCAN
Arm only goes as far as the last request in each direction, then reverses direction immediately, without first going all the way to the end of the disk. Applied Operating System Concepts

C-LOOK (Cont.) Applied Operating System Concepts

Selecting a Disk-Scheduling Algorithm
SSTF is common and has a natural appeal SCAN and C-SCAN perform better for systems that place a heavy load on the disk. Performance depends on the number and types of requests. Requests for disk service can be influenced by the file-allocation method - remember the tradeoffs in the various allocation schemes - see instructors notes: “Comments on File System Implementation”, and the text. The disk-scheduling algorithm should be written as a separate module of the operating system, allowing it to be replaced with a different algorithm if necessary. Either SSTF or LOOK is a reasonable choice for the default algorithm. Applied Operating System Concepts

Disk Management Low-level formatting, or physical formatting — Dividing a disk into sectors that the disk controller can read and write. How are sectors identified? Sector made up of a data structure: header, data area and a trailer - disk controller/device drivers must be capable of identifying and interpreting this. To use a disk to hold files, the operating system still needs to record its own data structures on the disk. Partition the disk into one or more groups of cylinders. Logical formatting or “creating a file system”. Includes: maps of free & allocated space (FAT or inodes), & initial empty directory. Boot block initializes system. The bootstrap is stored in ROM. Bootstrap loader program - stored in boot block. Methods such as sector sparing used to handle bad blocks - usually SCSI. DOS based systems identify bad blocks in the FAT Applied Operating System Concepts

Swap-Space Management
How is virtual memory implemented or created on a disk? All we have in the beginning are files and ”empty” memory. Swap-space — Virtual memory uses disk space as an extension of main memory - this is where the entire virtual memory is mapped and stored. Must have a disk sufficiently large to handle this address space. Swap space is not necessarily preserve once the machine is re-booted- it generally grows as the system gets used. Swap-space can be carved out of the normal file system,or, more commonly, it can be in a separate disk partition. Swap-space management 4.3BSD (UNIX version) allocates swap space when process starts; holds text segment (the program) and data segment. In UNIX 4.3BSD, when a process starts, its text is paged in from the file system and then written out to swap-space when necessary, and are read back in from there, so the file system is consulted only once for each text page. Kernel uses swap maps to track swap-space use. Solaris 2 allocates swap space only when a page is forced out of physical memory, not when the virtual memory page is first created. Applied Operating System Concepts

Disk Reliability OMIT remaining slides after this one - end of course!
ECC for reading sectors, CRC checks, ... Several improvements in disk-use techniques involve the use of multiple disks working cooperatively. Disk striping uses a group of disks as one storage unit. RAID schemes improve performance and improve the reliability of the storage system by storing redundant data. Mirroring or shadowing keeps duplicate of each disk. Block interleaved parity uses much less redundancy. Applied Operating System Concepts

Stable-Storage Implementation
Write-ahead log scheme requires stable storage. To implement stable storage: Replicate information on more than one nonvolatile storage media with independent failure modes. Update information in a controlled manner to ensure that we can recover the stable data after any failure during data transfer or recovery. Applied Operating System Concepts

Tertiary Storage Devices
Low cost is the defining characteristic of tertiary storage. Generally, tertiary storage is built using removable media Common examples of removable media are floppy disks and CD-ROMs; other types are available. Applied Operating System Concepts

Removable Disks Floppy disk — thin flexible disk coated with magnetic material, enclosed in a protective plastic case. Most floppies hold about 1 MB; similar technology is used for removable disks that hold more than 1 GB. Removable magnetic disks can be nearly as fast as hard disks, but they are at a greater risk of damage from exposure. Applied Operating System Concepts

Removable Disks (Cont.)
A magneto-optic disk records data on a rigid platter coated with magnetic material. Laser heat is used to amplify a large, weak magnetic field to record a bit. Laser light is also used to read data (Kerr effect). The magneto-optic head flies much farther from the disk surface than a magnetic disk head, and the magnetic material is covered with a protective layer of plastic or glass; resistant to head crashes. Optical disks do not use magnetism; they employ special materials that are altered by laser light. Applied Operating System Concepts

WORM Disks The data on read-write disks can be modified over and over.
WORM (“Write Once, Read Many Times”) disks can be written only once. Thin aluminum film sandwiched between two glass or plastic platters. To write a bit, the drive uses a laser light to burn a small hole through the aluminum; information can be destroyed by not altered. Very durable and reliable. Read Only disks, such ad CD-ROM and DVD, com from the factory with the data pre-recorded. Applied Operating System Concepts

Tapes Compared to a disk, a tape is less expensive and holds more data, but random access is much slower. Tape is an economical medium for purposes that do not require fast random access, e.g., backup copies of disk data, holding huge volumes of data. Large tape installations typically use robotic tape changers that move tapes between tape drives and storage slots in a tape library. stacker – library that holds a few tapes silo – library that holds thousands of tapes A disk-resident file can be archived to tape for low cost storage; the computer can stage it back into disk storage for active use. Applied Operating System Concepts

Operating System Issues
Major OS jobs are to manage physical devices and to present a virtual machine abstraction to applications For hard disks, the OS provides two abstraction: Raw device – an array of data blocks. File system – the OS queues and schedules the interleaved requests from several applications. Applied Operating System Concepts

Application Interface
Most OSs handle removable disks almost exactly like fixed disks — a new cartridge is formatted and an empty file system is generated on the disk. Tapes are presented as a raw storage medium, i.e., and application does not not open a file on the tape, it opens the whole tape drive as a raw device. Usually the tape drive is reserved for the exclusive use of that application. Since the OS does not provide file system services, the application must decide how to use the array of blocks. Since every application makes up its own rules for how to organize a tape, a tape full of data can generally only be used by the program that created it. Applied Operating System Concepts

Tape Drives The basic operations for a tape drive differ from those of a disk drive. locate positions the tape to a specific logical block, not an entire track (corresponds to seek). The read position operation returns the logical block number where the tape head is. The space operation enables relative motion. Tape drives are “append-only” devices; updating a block in the middle of the tape also effectively erases everything beyond that block. An EOT mark is placed after a block that is written. Applied Operating System Concepts

File Naming The issue of naming files on removable media is especially difficult when we want to write data on a removable cartridge on one computer, and then use the cartridge in another computer. Contemporary OSs generally leave the name space problem unsolved for removable media, and depend on applications and users to figure out how to access and interpret the data. Some kinds of removable media (e.g., CDs) are so well standardized that all computers use them the same way. Applied Operating System Concepts

Hierarchical Storage Management (HSM)
A hierarchical storage system extends the storage hierarchy beyond primary memory and secondary storage to incorporate tertiary storage — usually implemented as a jukebox of tapes or removable disks. Usually incorporate tertiary storage by extending the file system. Small and frequently used files remain on disk. Large, old, inactive files are archived to the jukebox. HSM is usually found in supercomputing centers and other large installaitons that have enormous volumes of data. Applied Operating System Concepts

Speed Two aspects of speed in tertiary stroage are bandwidth and latency. Bandwidth is measured in bytes per second. Sustained bandwidth – average data rate during a large transfer; # of bytes/transfer time. Data rate when the data stream is actually flowing. Effective bandwidth – average over the entire I/O time, including seek or locate, and cartridge switching. Drive’s overall data rate. Applied Operating System Concepts

Speed (Cont.) Access latency – amount of time needed to locate data.
Access time for a disk – move the arm to the selected cylinder and wait for the rotational latency; < 35 milliseconds. Access on tape requires winding the tape reels until the selected block reaches the tape head; tens or hundreds of seconds. Generally say that random access within a tape cartridge is about a thousand times slower than random access on disk. The low cost of tertiary storage is a result of having many cheap cartridges share a few expensive drives. A removable library is best devoted to the storage of infrequently used data, because the library can only satisfy a relatively small number of I/O requests per hour. Applied Operating System Concepts

Reliability A fixed disk drive is likely to be more reliable than a removable disk or tape drive. An optical cartridge is likely to be more reliable than a magnetic disk or tape. A head crash in a fixed hard disk generally destroys the data, whereas the failure of a tape drive or optical disk drive often leaves the data cartridge unharmed. Applied Operating System Concepts

Cost Main memory is much more expensive than disk storage
The cost per megabyte of hard disk storage is competitive with magnetic tape if only one tape is used per drive. The cheapest tape drives and the cheapest disk drives have had about the same storage capacity over the years. Tertiary storage gives a cost savings only when the number of cartridges is considerably larger than the number of drives. Applied Operating System Concepts

Chapter 1: Introduction 9/4/03 - nmg

Similar presentations

Presentation on theme: "Chapter 1: Introduction 9/4/03 - nmg"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 1: Introduction 9/4/03 - nmg

Similar presentations

Presentation on theme: "Chapter 1: Introduction 9/4/03 - nmg"— Presentation transcript:

Similar presentations

About project

Feedback