윤 준 기윤 준 기 Embedded System Lab. ☠ Xen What is Xen? A high performance resource- managed virtual machine monitor (VMM). Zen enables applications. server consolidation co-located hosting facilities distributed web services secure computing platforms application mobility. Xen 4.5
윤 준 기윤 준 기 Embedded System Lab. INTRODUCTION VMM : High performance resource-managed virtual machine monitor. Successful partitioning VMs must be isolated To support a variety of different OS The performance overhead introduced by virtualization should be small Xen enables to dynamically instantiate an operating system. performance isolation The scheduling, memory demand, network traffic, disk accesses Impact the performance of others
윤 준 기윤 준 기 Embedded System Lab. XEN: APPROACH & OVERVIEW Paravirtualization vs Full virtualization
윤 준 기윤 준 기 Embedded System Lab. XEN: APPROACH & OVERVIEW design principles Support for unmodified application binaries is essential, or users will not transition to Xen.. Supporting full multi-application operating systems is important, as this allows complex server configurations to be virtualized within a single guest OS instance. Paravirtualization is necessary to obtain high performance and strong resource isolation on uncooperative machine architectures such as x86. Even on cooperative machine architectures, completely hiding the effects of resource virtualization from guest OSes risks both correctness and performance.
윤 준 기윤 준 기 Embedded System Lab. XEN: APPROACH & OVERVIEW Xen design Xen is intended to scale to approximately 100 virtual machines running industry standard applications and services. 1) Denali does not target existing ABIs, and so can elide certain architectural features from their VM interface. 2) The Denali implementation does not address the problem of supporting application multiplexing, nor multiple address spaces, within a single guest OS. 3) In the Denali architecture the VMM performs all paging to and from disk. 4) Denali virtualizes the ‘namespaces’ of all machine resources, taking the view that no VM can access the resource allocations of another VM if it cannot name them.
윤 준 기윤 준 기 Embedded System Lab. The Virtual Machine Interface Such as memory management, are specific to the x86, many aspects (such as our virtual CPU and I/O devices) can be readily applied to other machine architectures.
윤 준 기윤 준 기 Embedded System Lab. Typical System call
윤 준 기윤 준 기 Embedded System Lab. Xen CPU X86 supports 4 levels of privileges 0 for OS, and 3 for applications Xen downgrades the privilege of OSes System-call and page-fault handlers registered to Xen “fast handlers” for most exceptions, Xen isn’t involved : Xen: CPU
윤 준 기윤 준 기 Embedded System Lab. Control transfer
윤 준 기윤 준 기 Embedded System Lab. Virtualizing Device I/O Xen exposes device abstractions. I/O data is transferred to and from each domain via Xen, using shared memory, asynchronous buffer descriptor rings.
윤 준 기윤 준 기 Embedded System Lab. Data Transfer: I/O Rings Two main factors have shaped the design of our I/O-transfer mechanism. 1. resource management 2. event notification I/O buffers are protected during data transfer by pinning the underlying page frames within Xen.
윤 준 기윤 준 기 Embedded System Lab. Data Transfer: I/O Rings Decouple the production of requests or responses from the notification of the other party. 1. Requests case : a domain may enqueue multiple entries before invoking a hypercall to alert Xen. 2. Responses case : a domain can defer delivery of a notification event by specifying a threshold number of responses. This allows each domain to trade-off latency and throughput requirements, similarly to the flow-aware interrupt dispatch in the ArseNIC Gigabit Ethernet interface.
윤 준 기윤 준 기 Embedded System Lab. Memory management Hardware provides a softwaremanaged TLB. Unfortunately, x86 does not have a software-managed TLB. TLB is not tagged, address space switches typically require a complete TLB flush. Given these limitations to made two decisions. 1. Guest OSes are responsible for allocating and managing the hardware page tables. 2. Xen exists in a 64MB section at the top of every address space, thus avoiding a TLB flush when entering and leaving the hypervisor. The OS must relinquish direct write privileges to the page-table memory: all subsequent updates must be validated by Xen.
윤 준 기윤 준 기 Embedded System Lab. Xen Control and Management
윤 준 기윤 준 기 Embedded System Lab. Subsystem Virtualization CPU scheduling Xen currently schedules domains according to the Borrowed Virtual Time (BVT) scheduling algorithm Fast dispatch is particularly important to minimize the effect of virtualization on OS subsystems that are designed to run in a timely fashion
윤 준 기윤 준 기 Embedded System Lab. Subsystem Virtualization Time and timers Each guest OS can program a pair of alarm timers, one for real time and the other for virtual time Real time : Real time is expressed in nanoseconds passed since machine boot and is maintained to the accuracy of the processor’s cycle counter and can be frequency-locked to an external time source Virtual time : virtual time only advances while it is executing Guest OSes are expected to maintain internal timer queues and use the Xen-provided alarm timers to trigger the earliest timeout
윤 준 기윤 준 기 Embedded System Lab. Subsystem Virtualization Virtual address translation No shadow pages (VMWare) Xen provides constrained but direct MMU updates All guest OSes have read-only accesses to page tables Updates are batched into a single hypercall Updates must be validated by Xen Guest OSes are responsible for allocation and managing pages within their own domain Xen exists in a generally unused section at the top of every address space to prevent paging out
윤 준 기윤 준 기 Embedded System Lab. Subsystem Virtualization Physical memory Reserved at domain creation times Memory statically partitioned among domains Does not guarantee contiguous regions of memory Supports hardware~physical mapping by providing shared translation array readable by all domains
윤 준 기윤 준 기 Embedded System Lab. Subsystem Virtualization Network Virtual firewall-router attached to all domains Round-robin packet scheduler To send a packet, enqueue a buffer descriptor into the transmit rang Use scatter-gather DMA (no packet copying) 1. A domain needs to exchange page frame to avoid Copying 2. Page-aligned buffering
윤 준 기윤 준 기 Embedded System Lab. Subsystem Virtualization Disk Only Domain0 has direct access to disks Other domains need to use virtual block devices 1. Use the I/O ring 2. Reorder requests prior to enqueuing them on the ring 3. If permitted, Xen will also reorder requests to improve Performance Use DMA (zero copy)
윤 준 기윤 준 기 Embedded System Lab. The Cost of Porting an OS to Xen More changes were required in Windows XP, mainly due to the presence of legacy 16-bit emulation code and the need for a somewhat different boot-loading mechanism.
윤 준 기윤 준 기 Embedded System Lab. Evaluation Based on Linux 2.4.21(neither XP nor NetBSD fully functional) Thoroughly compared to 2 other systems 1. –VMware Workstation (binary translation) 2. –UML (run Linux as a Linux process) Performs better than solutions with restrictive licenses (ESX Server)
윤 준 기윤 준 기 Embedded System Lab. Relative Performance
윤 준 기윤 준 기 Embedded System Lab. Operating System Benchmarks As expected fork, exec and shrequire large number of page updates which slow things down On the up side these can be batched (up to 8MB of address space constructed per hypercall)
윤 준 기윤 준 기 Embedded System Lab. Operating System Benchmarks Overhead due to a hypercall when switching context in a guest OS (in order to change base of page table) The larger the working set the smaller the relative overhead
윤 준 기윤 준 기 Embedded System Lab. Operating System Benchmarks 2 transitions into XEN One for the page fault handler One to actually get the page
윤 준 기윤 준 기 Embedded System Lab. Operating System Benchmarks Page flipping really pays off –no unnecessary data copying More overhead for smaller packets –we still need to deal with every header
윤 준 기윤 준 기 Embedded System Lab. Concurrent Virtual Machines Unexpectedly low SMP performance for 1 instance of Apache As expected adding another domain leads to a sharp jump in performance under XEN More domains –more overhead
윤 준 기윤 준 기 Embedded System Lab. Concurrent Virtual Machines Performance differentiation works as expected with IR But fails with OLTP Probably due to inefficiencies with the disk scheduling algorithm
윤 준 기윤 준 기 Embedded System Lab. Isolation Run uncooperative user applications, see if they bring down the system 2 “bad” domains vs 2 “good” ones XEN delivers good performance even in this case
윤 준 기윤 준 기 Embedded System Lab. Scalability Very low footprint per domain (4 -6MB memory, 20KB state) Benchmark is compute-bound and Linux assigns long time slices, XEN needs some tweaking Even without it does pretty well (but no absolute values)
윤 준 기윤 준 기 Embedded System Lab. Thank you! Any questions?