The Performance of μ-Kernel-Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presenter: Sunita Marathe.

The Performance of μ-Kernel-Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presenter: Sunita Marathe

Introduction  First generation μ-kernels have reputation for being slow and inflexible  Demonstrate thru experiments the performance and flexibility of L4, a 2nd generation μ-kernel  Experiments include porting Linux on to the L4 μ-kernel and comparing its performance with both native Linux and with MkLinux running on Mach 3.0, a first generation μ-kernel

L4 Essentials  Task: an address space and a set of threads executing in this space  IPC primitive: a cross address space communication mechanism  Supports recursive construction of address spaces by user-level pagers Initial space, σ 0, represents physical memory Further spaces constructed by using primitives for granting, mapping, and unmapping logical pages Owner of address space can grant or map page to another address space All address spaces maintained by user-level pagers Enables implementation of different memory management policies  I/O ports treated as part of address space: can be mapped/unmapped  Interrupts: converted to messages to interrupt handler threads. Allows device drivers to be implemented as user level servers.

L 4 Linux – Design & Implementation  Fully binary compliant with Linux/X86  Modifications restricted to architecture-dependent part of Linux: Linux not fine tuned for L4  No Linux-specific modifications to L4 kernel: tests the generality and flexibility of L4 interface

L 4 Linux – Design & Implementation  Linux services provided via single Linux server in a μ-kernel task  μ-kernel tasks used for Linux user processes  On booting, the Linux server requests memory from its pager, which maps physical memory into the server’s its address space  The Linux server then acts as the pager for the user processes it creates  L4 converts user-process page faults into an RPC to the Linux server, which maps pages from its address space to the user process.  A single L4 thread is used in the Linux server for handling system calls and page faults. This thread is multiplexed to avoid user processes blocking in the kernel

L 4 Linux – Design & Implementation Interrupt Handling  Linux interrupt handlers divided into top and bottom halves. Top halves are implemented as one server thread per interrupt source. Bottom halves all execute in a single thread.  L4 maps a hardware interrupt to a message to a interrupt thread  Linux interrupt threads have a higher priority than the main thread, avoiding concurrent execution.

L 4 Linux – Design & Implementation System Call Mechanisms  System calls implemented as IPC between user process and the Linux server Three system call interfaces:  A modified version of libc.so that uses L4 IPC primitives to call the Linux server  A correspondingly modified version of libc.a  A user-level exception handler (trampoline) which emulates the native system call ‘trap’ instruction by calling the corresponding routine in the modified shared library. The first two options are the fastest. The third is for binary compatibility.

L 4 Linux – Design & Implementation Signalling Each user process has a signal-handler thread Linux server’s main thread delivers a signal to the user process by sending a message to the user signal-handler thread The signal-handler causes the user process’s main thread to save it’s state and enter Linux by manipulating the main thread’s SP and PC

L 4 Linux – Design & Implementation  Scheduling All thread scheduling is done by the L4 kernel The Linux server’s schedule() routine is only used for multiplexing the Linux server Main thread across concurrent Linux system calls On completion of a system call, if there is nothing urgent to do, the Linux server resumes the corresponding user thread and sleeps waiting for a new system call message or a wakeup message from an interrupt thread  This allows a user process to make several system calls per time slice without blocking

Compatibility Performance Questions:  What is the penalty of using L 4 Linux ? Compare L 4 Linux to native Linux  Does the performance of the underlying micro-kernel matter ? Compare L 4 Linux to MkLinux  Does co-location improve performance ? Compare L 4 Linux to an in-kernel version of MkLinux

Microbenchmarks  System call overhead for getpid(), the shortest system call

Microbenchmarks (cont.) The lmbench suit measures operations such as system calls, context switches, memory accesses, network operations etc.  Results have been normalized to native Linux  Represented as slowdowns: a shorter bar is a better result

Macrobenchmarks  Tests how well a multi user system performs under different loads  Successively increases load until max throughput of system is determined Results of AIM multiuser benchmark suit Required time Achieved throughput (jobs/min)

Performance Analysis  Low penalty for using L 4 Linux L 4 Linux performance reasonably close to native Linux, even under high load. Averaged across all loads: 8.3% slower, At maximum load: 6.8% slower.  Performance of the underlying micro-kernel matters User level MkLinux: 49% average, 60% at maximum.  Co-location on its own does not improve performance Co-located MkLinux: 29% average, 37% at maximum.

Extensibility Performance What added value can a micro kernel provide ?  Specialization – improved implementation of special OS functionality using μ-kernel primitives  Extensibility – permits implementation of new services and policies that cannot be easily added to a conventional OS.

Pipes and RPC (1) - (1d) Use standard asynchronous pipe mechanism of Linux kernel. Remaining use L4 IPC primitives: (2) Emulation of POSIX pipes: based on synchronous L4 IPC, uses additional thread for buffering and cross-address-space communication with receiver. (3) Synchronous RPC: uses blocking IPC directly, without buffering data. (4) Synchronous mapping RPC: sender maps pages into the receiver’s address space.

Real time memory management  L4’s hierarchical user level pagers allow multiple memory systems with different policies to be run in parallel  Real-time tasks cannot afford the performance loss resulting from cache misses due to cache-interference caused by interleaving threads using the same cache lines  A main memory manager can be built on top of L4 to partition the 2 nd -level cache between multiple real-time tasks and to isolate real-time from timesharing applications.

Conclusion  The L4 μ-kernel shows a significant performance improvement over previous μ-kernels. Fast IPC and efficient mapping abstractions are more effective than techniques such as co-location  The penalty for using such a μ-kernel can be kept between 5-10%  Provides a foundation for building specialized applications (such as real- time) which can run with a normal OS and its applications on a single machine.  Optimizations such as co-locating the Linux Server could improve L 4 Linux further.

Acknowledgements  Thanks to Seungweon Park, whose presentation slides were the source for all the diagrams.

The Performance of μ-Kernel-Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presenter: Sunita Marathe.

Similar presentations

Presentation on theme: "The Performance of μ-Kernel-Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presenter: Sunita Marathe."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Performance of μ-Kernel-Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presenter: Sunita Marathe.

Similar presentations

Presentation on theme: "The Performance of μ-Kernel-Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presenter: Sunita Marathe."— Presentation transcript:

Similar presentations

About project

Feedback