Presentation on theme: "Linux: The Guts By Sam Evans and John Massey. History Of Linux ❖ 1984: Richard Stallman quits his job at MIT, and starts working on the GNU project. ❖"— Presentation transcript:
Linux: The Guts By Sam Evans and John Massey
History Of Linux ❖ 1984: Richard Stallman quits his job at MIT, and starts working on the GNU project. ❖ 1985: The Free Software Foundation is founded by Richard Stallman. The GNU Manifesto, advocating the cause of the free software movement, is published. ❖ 1991: The Linux Kernel is publicly announced on 25 August by the 21 year old Finnish student Linus Benedict Torvalds. On 17 September the first public version appears on an ftp server. Some developers are interested in the project and contribute improvements and extensions. ❖ 1992: The Linux Kernel is relicensed under the GNU GPL. The first Linux distributions are created. ❖ 1994: Torvalds judges all components of the kernel to be fully matured: he releases version 1.0 of Linux. This version of the kernel is, for the first time, networkable. The XFree86 project contributes a graphic user interface (GUI). Red Hat and SUSE publish version 1.0 of their Linux distributions. ❖ 1995: In March the next stable branch of Linux appears, the 1.2 series. Later in the year Linux is ported to the DEC and to the Sun SPARC. Over the following years it is ported to an ever greater number of platforms. ❖ 1996: Version 2.0 of the kernel is released. The kernel can now serve several processors at the same time, and thereby becomes a serious alternative for many companies. ❖ 1999: The 2.2 series appears in January with improved network code and improved SMP support. ❖ 2001: The 2.4 series is released in January. The Kernel now supports up to 64 GB of RAM, 64-bit datasystems, USB and a journaling filesystem. ❖ 2003: At the end of the year the 2.6 kernel is released, after which Linus Torvalds goes to work for the OSDL. Linux becomes used more widely on embedded systems. ❖ Present Day: Different distributions of Linux are available to suit your every need. Most versions are completely free, there are even groups which will mail you an installation CD free of charge. Ubuntu Linux is now available on computers from several major retailers, and Xandros is the distribution of choice for the new $200 Eee PC. ❖ The Linux kernel is distributed under the GNU general public license (GPL). This means that people can copy it, modify it, use it in any manner that they want, and give away their own copies, without any restrictions.
Design Principles ❖ Multiuser, multitasking system with a full set of UNIX-capable tools. The internal details of Linux's design have been influenced by the history of the operating system's development. ❖ The problem with the many different flavors of Linux is that source code written for one flavor may not necessarily compile or run correctly on another. Even when the same two system calls are present on two different systems, they might not behave in the same way. ❖ Linux is designed to be compliant with the relevant POSIX documents. The POSIX standards define a set of specifications of different aspects of operating- system behavior. There are POSIX documents for common operating system functionality and for extensions like process threads and real-time operations. In addition to the basic POSIX standard, Linux supports the POSIX threading extension, Pthreads, and a subset of the POSIX extensions for real-time control.
Components Of A Linux System ❖ Kernel - Responsible for maintaining all of the important abstractions of the operating system, including things such as virtual memory and processes. ❖ System Libraries - Define a standard set of functions through which applications can interact with the kernel. These functions implement much of the operating-system functionality that does not need the full privileges of kernel code. ❖ System Utilities - Programs that perform individual, specialized management tasks. Some may be invoked only once to initialize and configure some aspect of the system. Others, known as daemons, run permanently, handling tasks such as responding to incoming network traffic, accepting logon requests from terminals, and updating log files.
Kernel Modules ❖ The Linux kernel can load and unload modules dynamically at run time. The kernel does not necessarily need to know in advance which modules may be loaded, they are truly independent loadable components. ❖ Kernel modules run in privileged kernel mode, so they have full access to the hardware capabilities of the machine on which they run. Typically, a module might implement a device driver, a file system, or a networking protocol. ❖ Modules allow for users to compile their code on their own, and insert the module into the already running kernel. Without modules, you have to make a new kernel, recompile, relink, and reload. Users can benefit from using modules, since they can share things like device drivers, and not have to rebuild their kernel.
Components of Module Support ❖ Module Management - allows modules to be loaded into memory and to talk to the rest of the kernel ❖ Driver Registration - allows modules to tell the rest of the kernel that a new driver has become available ❖ Conflict Resolution Mechanisms - allows different device drivers to reserve hardware resources and to protect those resources from accidental use from another driver
Process Management ❖ Fork() and Exec() are treated separately ❖ There is no need to setup env. details for these to run ❖ Each Process gets a PID that is unique for the CPU to process system calls, waits, or modifies ❖ Additional ID are used to associate the process with a process group and log in session ❖ Credentials - Each process must have an associated user ID and one or more group IDs that determine the rights of a process to access system resources and files ❖ Personality - Each process has an associated personality that can modify the semantics of certain system calls for Linux distro. compatibility
Process Management cont. ❖ Environment Inherited from Parent ❖ Argument vector ❖ List of command line arguments for program to run ❖ Environment vector ❖ List of NAME=Value pairs that associates environment variables with arbitrary textual values and is stored in user-mode address space and not the kernel ❖ fork() all inherited from parent ❖ exec() new copy of all vectors created and user-mode libraries are responsible for environment.
Process Management cont... ❖ Process Context ❖ Changes constantly as it is the current running state of the program ❖ Scheduling Context ❖ Accounting ❖ File Table ❖ File-system Context ❖ Signal handler Table ❖ Virtual memory context
Dear God, It's... More Process Management ❖ Processes and Threads ❖ Create threads using clone system call ❖ Table of flags (in slide) ❖ Arguments used with clone decide which parts of the process are shared and which are copied ❖ Linux lacks distinction between forks and threads because it does not hold a process’s entire context within the main process data structure ❖ Holds the context within independent sub contexts (aka virtual memory held in different data structures) ❖ fork() is a special case of clone where all sub contexts are copied
Process Scheduling ❖ Two separate process-scheduling algorithm's ❖ Preemptive for multiple processes ❖ Runs in constant time (known as 0(1) ) ❖ Priority Based ❖ Real time ( ) ❖ nice'd ( ) ❖ Time quantum based on priority ❖ Lower priority = longer time quantum Real Time Tasks Other Tasks
Process Scheduling cont. ❖ Run Queue ❖ Process is considered for execution until its time quantum has expired ❖ Maintained in a run queue data structure ❖ SMP - each processor has its own run queue - scheduled independently ❖ Contains two priority array's "active" and "expired" ❖ Active = has time slice left ❖ Expired = no time slice left ❖ Both indexed by priority ❖ Next job chosen from Active array ❖ When Active is empty expired becomes Active ❖ Process priority re-assigned upon entering the Expired array ❖ Priorities assigned by their nice value +- 5 depending on interactivity of the task ❖ Interactivity determined by how long the processes sleeps waiting for IO ❖ Highly interactive tasks most likely adjusted closer to -5 ❖ CPU bound processes will most likely be lowered
Process Scheduling cont.. ❖ Process priority re-assigned upon entering the Expired array ❖ Priorities assigned by their nice value +- 5 depending on interactivity of the task ❖ Interactivity determined by how long the processes sleeps waiting for IO ❖ Highly interactive tasks most likely adjusted closer to -5 ❖ CPU bound processes will most likely be lowered
Process Scheduling cont... ❖ Real-time ❖ Uses two real-time scheduling classes required by POSIX ❖ FCFS ❖ Process runs until exit or block ❖ Round-Robin ❖ Will eventually be preempted ❖ Round Robin of equal priority will automatically time share among themselves ❖ Both use static priority ❖ Processor always runs highest priority, ties are broken by longest wait ❖ Soft Real Time Scheduling ❖ Scheduler offers strict guarantees about relative priorities of real time processes but the kernel does not offer any guarantees about how quickly a real-time process will scheduled after it becomes runnable.
Kernel Synchronization ❖ Requests for kernel mode execution happens two ways ❖ Explicitly via a system call ❖ Implicitly (page fault) ❖ Device Driver delivers a hardware interrupt causing the CPU to run a kernel defined handler ❖ Critical Section Problem
Kernel Synchronization ❖ Kernel is fully preempt-able (post 2.6 kernel) ❖ Single Processor Machines ❖ Preemption is enabled or disabled to provide a lock ❖ preempt_disable() and preempt_enable() are locking functions ❖ All kernel mode tasks require a preempt_count inside a thread-info structure ❖ If preempt_count is greater than 0 it is not safe to disable preemption ❖ SMP ❖ Short duration Spin Locks using preempt_disable and preempt_enable ❖ Semaphores are used for longer duration locks ❖ Downsides of preempt_disable() ❖ Expensive ❖ I/O is suspended ❖ Devices waiting on servicing will wait degrading performance Single ProcessorMultiple Processors Disable Kernel PreemptionAcquire Spin Lock Enable Kernel PreemptionRelease Spin Lock
Kernel Synchronization ❖ Kernel has synchronization architecture that allows for long critical section problems to run without disabling interrupts ❖ Top Half ❖ Normal interrupt service routines ❖ Runs with recursive interrupts disabled ❖ Higher priority can still interrupt ❖ Same or lower priority cannot interrupt ❖ Bottom Half ❖ Runs with all interrupts enabled ❖ Mini scheduler keeps bottom halves from interrupting themselves ❖ Mini scheduler runs automatically whenever an interrupt service exits ❖ If interrupted scheduler will schedule the current interrupt service to run next ❖ Top half can interrupt bottom half Increasing Priority
Kernel Synchronization cont. ❖ 2.0 kernel first to offer SMP ❖ Only one processor could handle kernel level code ❖ 2.2 Kernel has true SMP ❖ Single spin lock for kernel ❖ Big Kernel Lock ❖ Very course locking granularity ❖ Multiple spin locks ❖ Each locks a small subset of the kernel's data structures (top and bottom halves) ❖ 2.6 Kernel improvements ❖ Processor affinity ❖ Load balancing algorithm's
Memory Management Management of Physical Memory ❖ Hardware requires that Linux split physical memory into three zones ❖ ZONE_DMA ❖ ZONE_NORMAL ❖ ZONE_HIGHMEM ❖ ZONE_DMA and ZONE_NORMAL are ❖ Architecture specific ❖ 80x86 (ISA Standard) ❖ First 16 MB or memory is ZONE_DMA ❖ ZONE_NORMAL is memory addressed to CPU address space ❖ Other system (no DMA requirements) ❖ ZONE_DMA is not present ❖ ZONE_NORMAL is used ❖ ZONE_HIGHMEM (high memory) ❖ memory not mapped to kernel address space ❖ 32 bit Intel where 4GB of memory addresses are provided ❖ 896 MB is mapped to the kernel ❖ the rest goes to high memory ZONEPhysical Memory ZONE_DMA< 16 MB ZONE_NORMAL MB ZONE_HIGHMEM> 896 MB
Memory Management cont. ❖ Kernel maintains a list of free pages in each zone ❖ Kernel assigns pages to the correct zone for each request ❖ Each zone has a page allocator, that can allocate physical contiguous pages ❖ Allocator uses buddy system to track available pages ❖ Adjacent memory units are paired creating larger buddy heaps ❖ Heaps can be split to fulfill small requests ❖ Smallest locatable memory size is a single page ❖ Memory can be assigned ❖ Statically at boot time (drivers/kernel modules) ❖ Page allocator ❖ Kernel functions do not need the basic allocator ❖ several specialized functions use the page allocator to manage their own memory pools ❖ Virtual Memory ❖ kmalloc() variable-length allocator ❖ Slab Allocator (allocates memory for kernel data structures) ❖ Page Cache 16 KB 8KB 8KB 8KB 4KB 4KB Buddy System
Memory Management cont.. ❖ kmalloc() ❖ Allocates entire pages on demand but then splits them ❖ Kernel maintains a set of lists of pages in use by kmalloc() ❖ Memory allocation looks at the lists of pages in use by kmalloc() for space or assigns a new page to be split ❖ Memory allocated by kmalloc() are permanent until freed explicitly ❖ Memory cannot be freed by kmalloc() to help in memory shortages
Slab Allocation ❖ Slab is used for allocating memory for kernel data structures and is mad of one or more physically contiguous pages ❖ Cache consists of one or more slabs ❖ Single cache for each unique kernel data structure ❖ Process descriptors ❖ File objects ❖ Semaphores ❖ Each cache is populated with objects that are instantiations of the kernel data structure they represent ❖ Slab Allocation Algorithm ❖ Cache is created to store kernel objects ❖ A number of objects are created in the cache and marked as free ❖ New kernel objects are assigned memory from the cache and the cache is marked as used ❖ Slabs can be ❖ Full ❖ Empty ❖ Partial ❖ Kernel tries to fill Empty or Partial slabs before creating new ones Caches Slabs Kernel Objects 3KB objects 7KB objects
Page Cache ❖ Used for block-oriented devices and memory-mapped files ❖ I/O to those devices ❖ Native Linux and NFS file-systems use page cache ❖ Closely tied to Virtual Memory
Virtual Memory ❖ Manages the contents of each process's virtual address space, interacts closely with the page cache because reading a page of data into the page cache requires mapping pages in the page cache using the virtual memory system ❖ Maintains address space visible to each process and loading to and from disk ❖ Two views ❖ Separate set of regions ❖ Logical view ❖ Describes instructions that the system has received concerning the layout of address space ❖ Consists of set of non overlapping regions representing a continuous, page-aligned subset of address space ❖ Initially described by a single vm_area_struct ❖ Defines properties of the region ❖ Process read/write permissions ❖ Any associated files ❖ Regions for each address space are linked into a balanced binary tree to all for fast look-up of the region corresponding to any virtual address
Virtual Memory cont. ❖ Set of pages ❖ Physical view ❖ Stored in the hardware page tables for the process ❖ Determine exact physical location of each page of virtual memory ❖ Disk ❖ Physical memory ❖ Managed by by a set of routines invoked by the kernel;s software- interrupt handlers when a process asked for a page not currently in the page table (page fault) ❖ vm_area_struct ❖ Contains a field that points to a table of functions that implement the key page-management functions for any given VM region ❖ All page faults are eventually dispatched to the appropriate handler in the function table of vm_area_struct ❖ This keeps the central memory management routines from having to know the details of managing each possible type of memory region
Virtual Memory cont. ❖ Virtual Memory Regions ❖ Regions supported by ❖ File ❖ Acts as a view port to a section of the file ❖ Gets address of page from kernel page cache ❖ Multiple files can access but they will use the same page of physical memory ❖ Nothing - Zero-demand memory ❖ when a process tries to read a page in this region they are returned a page filled with zeros ❖ Read/Write ❖ Private - copy on write ❖ Shared - just write, visible to other processes immediately
Lifetime of a Virtual Address Space ❖ Created two ways ❖ exec() ❖ new completely empty virtual address space ❖ up to the routines for loading the program to populate the space ❖ fork() ❖ crates a complete copy of the process's virtual address space ❖ copies vm_area_struct and recreates the set of page tables for the child ❖ parents page tables copied directly to the child ❖ reference count of each page is incremented ❖ parent and child share same physical memory ❖ if VM is mapped as private ❖ uses copy on write
Swapping and Paging ❖ Linux uses paging exclusively rather than swapping out entire processes ❖ Two Sections ❖ Policy algorithm decides which pages to write out to disk and when to write them ❖ Paging mechanism carries out the transfer and pages the data back into physical memory when they are needed again ❖ Page-out Policy ❖ Uses a modified version of the standard clock (second-chance algorithm) ❖ Every page has an age that is incremented every pass of the clock ❖ Measures activity on the page ❖ Unused pages will approach 0 and used pages will have a higher age ❖ Policy uses LFU (least frequently used) ❖ Paging Mechanism ❖ Supports paging to dedicated swap devices and partitions and to files ❖ File is much slower ❖ Physical memory maintains the mapping of used blocks ❖ Uses next-fit algorithm to try to write pages to continuous disk blocks for performance ❖ Uses page not present bit (on processor) and index of where page is written is in the page-table for faster look-ups
Kernel Virtual Memory ❖ Kernel reserves its own constant region of virtual address space of every process ❖ Page Table entries that map to these kernel pages are marked protected ❖ Contains two regions ❖ Static page-table references to every available physical page in the system ❖ Used for quick physical to virtual page look-ups when kernel code is run ❖ The kernel core and all pages allocated by the normal page allocator reside in this region ❖ The rest is reserved for no specific use ❖ This memory can be modified to point to any other memory areas ❖ vmalloc() ❖ allocates an arbitrary number of physical pages of memory that may be physically contiguous into a single region of virtually contiguous kernel memory ❖ vremap() ❖ maps a sequence of virtually addresses to point to an area of memory used by a deice driver for memory-mapped I/O
Execution and Loading of Programs ❖ Execution triggered by exec() system call ❖ Kernel checks that the calling process has permissions to execute the file ❖ Loader routine starts running the program - Will at least set up the mapping of the program into virtual memory ❖ Linux has a loader table that allows each loader to try launching a program ❖ a.out ❖ Older and not expendable without problems ❖ ELF ❖ Newer, expendable and more flexible (easier to add debugging and other features, with less conflicts)
Mapping of Programs Into Memory ❖ Coverage of ELF binaries ❖ Binary loader only loads program into virtual memory ❖ Each new access causes a page fault (demand paging) ❖ Loader reads the header and maps the sections of the file into separate regions of virtual memory ❖ Does initial memory mapping to allow for program to start ❖ Sets up Stack, text, data regions ❖ After setting up all the mapping the loader initializes the process's program counter register with the starting point recorded in the ELF header an the process can then be run.
Static and Dynamic Linking ❖ Static Linking ❖ Program has an embedded link to necessary library files and these libraries can be loaded as soon as the application starts running ❖ Every program generated must contain copies of exactly the same common system library functions ❖ Costs memory and disk space ❖ Dynamic Linking ❖ Implemented by using a special liker library ❖ Every dynamically linked program runs a small function that maps the link library into memory and runs the code that the library contains ❖ The linker determines what is necessary to be copied into memory and maps this information into the middle of virtual memory ❖ The libraries are compiled using position-independent code which can run in any place in memory
Input and Output ❖ To the extent possible, all device drivers appear as normal files. Devices can appear as objects within the file system. The system admin can create special files within a file system that contain references to a specific device driver. A user opening such a file can read from and write to the device referenced. Normal file-protection can be used to set the access permissions for each device. ❖ Devices are split into three classes: ❖ Block Devices ❖ All devices that allow random access to completely independent, fixed-sized blocks of data, including hard disks and floppy disks, CD-ROMs, and flash memory. Direct access to a block device is allowed so that programs can create and repair the filesystem that the device contains. Also, applications can access these block devices to perform their own, fine-tuned laying out of data, rather than using the general purpose filesystem. ❖ vocab: block, buffer, request manager, deadline I/O scheduler, sorted queue, read queue, write queue ❖ Character Devices ❖ Most other devices, like mice and keyboards. The difference is that character devices are only accessed serially. Seeking to a certain position in a file might be supported for a DVD, but makes no sense to a device such as a mouse. ❖ vocab: line disciplines ❖ Network Devices ❖ Users cannot directly transfer data to network devices; instead, they must communicate indirectly by opening a connection to the kernel's networking subsystem.
Interprocess Communication ❖ Synchronization and Signals ❖ Signals can be sent from any process to any other process, with restrictions on signals sent to processes owned by another user. ❖ Signals cannot carry information. Only the fact that a signal occurred is available to a process. ❖ When a process wants to wait for some event to occur, it places itself on a wait queue associated with that event and tells the scheduler that it is no longer eligible for execution. Once the event has completed, it will wake every process on the wait queue. ❖ Passing of data among processes ❖ The pipe mechanism allows a child process to inherit a communication channel from its parent. Data written to one end of the pipe can be read at the other. ❖ Under Linux, pipes use a pair of wait queues to synchronize the reader and the writer. ❖ There is always the shared memory way to pass data.
Network Structure ❖ Since Linux was originally implemented primarily on PCs, it supports many of the protocols typically used on PC networks, such as AppleTalk and IPX. ❖ Networking is implemented by three layers of software: ❖ The socket interface ❖ Protocol Drivers ❖ Network-device drivers ❖ User applications perform all networking requests through the socket interface. ❖ The next layer of software is the protocol stack. When data arrives at this layer, it is expected to have been tagged with an identifier specifying which network protocol they contain. The protocol layer may rewrite packets, create new packets, split or reassemble packets into fragments, or just throw away incoming data. Once it has finished processing a set of packets, it passes them on to the socket interface if the data is meant for a local connection, or to a device driver if that packet needs to be transmitted remotely. ❖ skbuff structures are used for communication between the layers of the networking stack. ❖ The TCP/IP protocol suite ❖ Incoming IP packets are delivered to the IP driver, whose job it is to perform routing. ❖ At various stages, IP software passes packets to a section of code for firewall management (selective filtering of packets according to arbitrary criteria) ❖ The IP driver also performs the disassembly and reassembly of large packets. ❖ If an outgoing packet is too large to be queued to a device, it is broken into smaller fragments. At the receiving host, these fragments are then reassembled.
Security - Authentication ❖ Making sure that nobody can access the system without first providing that he/she has entry rights. ❖ History ❖ Public Readable File ❖ Salt value + encoding algorithm = password stored in file ❖ Problems ❖ Easy to hack ❖ Limited number of salt values ❖ Password length restrictions ❖ Fixes ❖ Change permissions on password file ❖ Allow for longer passwords ❖ More salt values ❖ Added access limits ❖ Current ❖ PAM (Pluggable Authentication Modules) ❖ Based on share library ❖ Used as a system library that can be used by any system component that needs to authenticate users ❖ Loaded on command as specified in a system-wide configuration file. ❖ New authentication methods can be added to the configuration and all programs can use the new feature ❖ Modules ❖ Authentication methods ❖ Account restrictions ❖ Session-setup functions ❖ Password-changing functions (all passwords updated at once)
Security - Access Control ❖ Providing a mechanism for checking whether a user has the right to access a certain object and preventing access to objects as required ❖ Unique numeric identifiers (UID) ❖ Identifies a single user or a single set of access rights ❖ Unique to the user ❖ Group identifier (GID) ❖ Defines access rights for more than one user ❖ Users can have multiple ❖ Access controls apply to ❖ Every file on the system ❖ Shared objects ❖ Shared-memory segments ❖ Semaphores
Security - Access Control cont. ❖ Each access controlled object in the system has a uid and single gid assigned to it ❖ If process uid and user uid match the user is the owner and has user or owner rights ❖ If the process gid and one of the user gid's match then the user has group access rights to the object ❖ Otherwise the user has world rights to the object ❖ Protection mask ❖ Determines read/write/execute permissions on the object ❖ The Root user has full access rights to the entire system regardless of the permissions assigned ❖ Most kernel processes run as the root uid
Security - Access Control Functions ❖ setuid () ❖ Allows a process to allow a user to access resources they normally could not (lpr - needs printing system) ❖ Real uid is the same as the user who is running the program ❖ Effective uid is the same as the files owner ❖ Process can change UID's while running, allowing the program to only need the effective uid when necessary ❖ fsuid () and fsgid () ❖ Allow the uid or gid of files to be temporarily changed ❖ Used in servers to safely serve files without worry of attack ❖ Further enhancements ❖ Processes can share permissions on files ❖ Example. Linux print system. When a user sends a print job the program passing the job passes permission for the file to be read. Removing the need for the print system to have access to every file in the users home folder.