Presentation on theme: "Outline of the Paper Introduction. Overview Of L4. Design and Implementation Of Linux Server. Evaluating Compatibility Performance. Evaluating Extensibility."— Presentation transcript:
Outline of the Paper Introduction. Overview Of L4. Design and Implementation Of Linux Server. Evaluating Compatibility Performance. Evaluating Extensibility Performance. Alternative concepts from a performance point of view. Conclusion
Introduction Motivation: Microkernel based systems found too slow Goal: Show that microkernel based systems can be practical with good performance Method: Conduct experiments on L4, a lean second generation microkernel with Linux running on top of it The resulting system called L4linux Compare performance of L4linux to native Linux and Mklinux- Linux running on mach derived first generation microkernel
L4 Essentials Based on two concepts - Address spaces and threads. Address spaces- Constructed recursively by user level servers called pagers outside the kernel. Initial address space-Physical memory The next address spaces created by granting, mapping and unmapping flexpages. Flexpages- Logical pages of sizes 2^n ranging from 1 physical page to entire address space. Pagers act as main memory managers enabling the implementation of memory management policies Threads- An activity executing inside the address space can dynamically associate with individual pagers. IPC refers to cross address space communication I/O ports treated as part of address space Hardware interrupts handled as IPC
Linux Design and Implementation L4 implemented on Pentium, Alpha and MIPS architectures Linux has architecture dependent and independent parts. All modifications done to architecture dependent part. Application binary interface in Linux unmodified. No Linux-specific modifications done to L4.
The Linux Kernel On booting, the Linux server requests memory from its pager which maps physical memory in to the server’s address space. The Linux server then acts as the pager for the user processes it creates. Hardware page tables are kept inside L4 and cannot be accessed directly by user processes leading to additional logical page tables kept by Linux kernel. A single L4 thread is multiplexed by L4linux to handle system calls and page faults. Interrupts disabled for synchronization and critical sections.
Interrupt Handling and Device Drivers Interrupt handlers in native Linux are subdivided in to top halves(Run immediately) and bottom halves(Run later). L4 maps hardware interrupts in to messages. Top half interrupt handlers are implemented as threads waiting for such messages, one thread per interrupt source. Another thread handles all bottom halves when top half is completed.
Linux User Processes Linux user processes implemented as a task. The task is created by the Linux server and associates it with a pager. L4 converts any Linux user page fault in to an RPC and sends it to Linux kernel. The kernel then replies by mapping/unmapping the pages from its address space of the process.
System Call Mechanisms L4linux system call implemented as RPCs between user processes and Linux server. There are three system call interfaces: 1. A modified version of libc.so which uses L4 IPC primitives to call Linux kernel 2. A corresponding libc.a 3. A user level exception handler which does the system call trap instruction by calling the corresponding routine in the modified shared library. TLB flushes avoided L4linux uses physical copyin and copyout to exchange data between kernel and user processes instead of address translation by hardware
Signaling Linux kernel signals the user processes by manipulating their stack, SP and PC. In L4, each user process has a signal handler thread. Upon receiving signal from the Linux server, the signal handler causes the user process’s main thread to save its state and enter Linux and resumes the main thread.
Scheduling All threads are scheduled by L4’s internal scheduler. The native Linux server’s schedule() operation is used only for multiplexing Linux server thread across cross routines when concurrent calls are made. The number of co routine switches are minimized by sleeping until a new system call or wakeup call is received.
Supporting Tagged TLBs or Small spaces Tagged TLB used to avoid TLB flushes in native Linux However TLB conflicts have the same effect as TLB flushes due to extensive use of libraries and identical, virtual allocation of code and data in address spaces. In L4linux, a special library permits the customization of code and data The emulation library and signal thread can also be mapped closely to the application. Thus, servers executing in small address spaces can be built.
Compatibility Performance Three questions: What is the penalty of using L4linux instead of native Linux? - Explained by running benchmarks on native and L4linux using the same hardware. Does the performance of the underlying microkernel matter?- Explained by comparing L4linux to Mklinux. How much does co-location improve performance?- Explained by comparing user mode L4linux to in-kernel version of Mklinux.
Micro Benchmarks Used to analyze the detailed behavior of L4linux mechanisms getpid – the shortest system call was repeated in a loop.
Micro Benchmarks The Imbench benchmark suite measures system calls, context switches, memory accesses, pipe operations, networking operations etc. Hbench is the revised version of Imbench.
Macro Benchmarks Measure the system’s overall performance The time needed to recompile the L4linux server was 6-7% slowr than native Linux and 10-20% faster than both Mklinux versions. Commercial AIM multiuser benchmark used for a more systematic evaluation The system performance under different application loads was measured.
Compatibility Performance Analysis The current implementation of L4linux comes close to native Linux even under high load with penalties ranging from 5-10%. Both the macro and micro benchmarks shows that performance of microkernel matters. All benchmarks suggests that co-location itself does not improve performance
Extensibility Performance Main advantage of microkernel- Extensibility/specialization Three questions: 1. Can we add services outside L4linux to improve performance by specializing in Unix? 2. Can we improve certain applications by using native microkernel mechanisms in addition to the classical API? 3. Can we achieve high performance for non-classical Unix compatible systems coexisting with L4linux? These three questions are answered by specific examples.
Pipes and RPC Four variants of data exchange are compared. 1. Standard pipe mechanism 2. Asynchronous pipes on L4- runs only on L4 and needs no Linux kernel. 3. Synchronous RPC- Uses blocking IPC directly without buffering data. 4. Synchronous mapping RPC- Sender maps pages in to receiver’s address space Imbench used to measure latency and bandwidth.
Cache Partitioning L4’s hierarchical user level pagers allow both L4linux memory system and a dedicated real time system to run in parallel. The worst case execution time is considered the optimization criteria in real time systems. A memory manager on top of L4 is used to partition cache between multiple real time tasks to minimize cache interference costs. Time for matrix multiplication was measured with: 1. Uninterrupted cache conflicts- 10.9ms 2. Interrupted cache conflicts- 96.1ms 3. Cache partitioning avoiding secondary cache interference-24.9 ms
The time taken(in microseconds) for selected memory operations in native Linux and L4linux are compared. Virtual Memory Operations L4Linux Fault6.2n/a Trap3.412 Appel11255 Appel21044
Extensibility Performance Analysis Unix compatible functionality can be improved by microkernel primitives. Eg: pipes, VM operations. Unix compatible or partially compatible functions can be added to the system that outperforms implementations based on unix API. Eg: RPC, User level pagers for VM operations. Microkernel offers possibilities for coexisting systems based on different paradigms. Eg: Real time systems and MMU.
Alternative Basic Concepts Can a mechanism at a lower level than IPC or a grafting model improve performance of a microkernel? Protected Control Transfer A parameter less cross address space procedure call via a callee defined gate. Time taken for PCT and IPC were compared and PCT does not offer significant improvement. Grafting Downloading extensions in to the kernel. Performance impact is still an open question.
Conclusion The performance of L4 is significantly better than the first generation microkernel. The throughput for L4 is only 5% less than native Linux whereas first generation microkernel were 5-7 times worse than native Linux. The overall system performance does depend on the performance of the microkernel. Modifications to Linux to suit L4 will further improve performance. L4 provides an apt platform to build specialized systems.