Differentiated I/O services in virtualized environments

Name: Differentiated I/O services in virtualized environments
Uploaded: 2017-12-15T04:34:18+00:00
Duration: PTM18S9
Channel: Isabella Speller
Description: Differentiated I/O services in virtualized environments

Differentiated I/O services in virtualized environments
Tyler Harter, Salini SK & Anand Krishnamurthy The project we have been working on is “Differentiated I/O services in virtualized environments”

Overview Provide differentiated I/O services for applications in guest operating systems in virtual machines Applications in virtual machines tag I/O requests Hypervisor’s I/O scheduler uses these tags to provide quality of I/O service The objective of this project is to provide differentiated I/O services for applications in guest operating systems running in virtual machines. The way we do this is by tagging I/O requests from applications and then use them at hypervisor to provide good quality of I/O service. Network IO already provides a mechanism to end to end qos via DSCP (Differentiated Services Code Point bits), MPLS,.. In virtualized settings, Software switches like OpenVSwitch from VMWare/Nicira offers qos, rate limiting,etc. and they can operate at Ip, transport port granularity. Leaving away the network part, in this presentation, I will be focusing on storage I/O

Motivation Variegated applications with different I/O requirements hosted in clouds Not optimal if I/O scheduling is agnostic of the semantics of the request Many types of applications with different I/O requirements run in clouds. It isn’t a good idea if we do I/O scheduling ignoring the semantics of the request. For example, let us consider the below example.

Motivation VM 1 VM 2 VM 3 Hypervisor

Motivation VM 2 VM 3 Hypervisor

Motivation We want to have high and low priority processes that correctly get differentiated service within a VM and between VMs Can my webserver/DHT log pusher’s IO be served differently from my webserver/DHT’s IO?

Existing work & Problems
Vmware’s ESX server offers Storage I/O Control (SIOC) Provides I/O prioritization of virtual machines that access a shared storage pool But it supports prioritization only at host granularity! Let us now look at the current state of the art technologies for I/O prioritization. Vmware’s ESX server provides I/O prioritization for a shared data store for virtual machines via SIOC But the problem is it supports only host-level granularity.

Xen credit scheduler also works at domain level Linux’s CFQ I/O scheduler supports I/O prioritization Possible to use priorities at both guest and hypervisor’s I/O scheduler Xen credit scheduler also works at domain level Linux’s CFQ I/O scheduler also supports I/O prioritization, but there are a couple of issues which we will see in the forthcoming slides.

Original Architecture
QEMU Virtual SCSI Disk Syscalls I/O Scheduler (e.g., CFQ) Guest VMs Host High Low This is the vanilla KVM/Qemu architecture with CFQ IO schedulers at both guest and hypervisor.

Original Architecture
Virtual machines do not support priorities per guest process and hence I/O prios should be enforced at the guest scheduler level.

Problem 1: low and high may get same service
A high priority process in one VM and a low priority process in another VM may get treated equally since there is no global view of IO priorities at the hypervisor

Problem 2: does not utilize host caches
In addition to that if we block low priority read requests at guest to cater high priority ones, we might actually ignore the file system buffer cache in the hypervisor leading to unintended delay in latencies.

Xen credit scheduler also works at domain level Linux’s CFQ I/O scheduler supports I/O prioritization Possible to use priorities at both guest and hypervisor’s I/O scheduler Current state of the art doesn’t provide differentiated services at guest application level granularity Takeaway is that current state of the art doesn’t provide differentiated services at guest application level granularity

Solution Tag I/O and prioritize in the hypervisor

Outline KVM/Qemu, a brief intro… KVM/Qemu I/O stack
Multi-level I/O tagging I/O scheduling algorithms Evaluation Summary This will be the outline of the presentation. Before delving into the details of multi-level IO tagging, I would like to take 2 mins to walk through the Qemu-IO stack for people who worked with other virtualization technologies like Xen, Virtual Box and not familiar with Qemu-KVM. Then we will explain the modifications that we made for providing differentiated I/O services like the multi-level I/O tagging and the scheduling algorithms. We will then analyze how our system behaves for different workloads and configurations. After that we will conclude with our learnings and then take questions.

KVM/Qemu, a brief intro.. kernel-mode: switch into guest-mode and handle exits due to I/O operations KVM module part of Linux kernel since version 2.6 Linux has all the mechanisms a VMM needs to operate several VMs. Has 3 modes:- kernel, user, guest user-mode: I/O when guest needs to access devices Relies on a virtualization capable CPU with either Intel VT or AMD SVM extensions guest-mode: execute guest code, which is the guest OS except I/O Linux Standard Kernel with KVM - Hypervisor Hardware n a normal linux environment each process runs either in user-mode or in kernel- mode. KVM introduces a third mode, the guest-mode. Therefore it relies on a virtualization capable CPU with either Intel VT or AMD SVM extensions

Linux Standard Kernel with KVM - Hypervisor
KVM/Qemu, a brief intro.. Each Virtual Machine is an user space process Linux Standard Kernel with KVM - Hypervisor Hardware

KVM/Qemu, a brief intro.. libvirt Other user space ps Linux Standard Kernel with KVM - Hypervisor Hardware

KVM/Qemu I/O stack Application in guest OS Application in guest OS
Issues an I/O-related system call (eg: read(), write(), stat()) within a user-space context of the virtual machine. This system call will lead to submitting an I/O request from within the kernel-space of the VM read, write, stat ,… System calls layer VFS The I/O request will reach a device driver - either an ATA-compliant (IDE) or SCSI FileSystem BufferCache Block SCSI ATA

KVM/Qemu I/O stack Application in guest OS Application in guest OS
The device driver will issue privileged instructions to read/write to the memory regions exported over PCI by the corresponding device read, write, stat ,… System calls layer VFS FileSystem BufferCache Block SCSI ATA

KVM/Qemu I/O stack Qemu emulator A VM-exit will take place for each of the privileged instructions resulting from the original I/O request in the VM The privileged I/O related instructions are passed by the hypervisor to the QEMU machine emulator These instructions will trigger VM-exits, that will be handled by the core KVM module within the Host's kernel-space context Linux Standard Kernel with KVM - Hypervisor Hardware

KVM/Qemu I/O stack Qemu emulator Upon completion of the system calls, qemu will "inject" an interrupt into the VM that originally issued the I/O request These instructions will then be emulated by device-controller emulation modules within QEMU (either as ATA or as SCSI commands) Thus the original I/O request will generate I/O requests to the kernel-space of the Host QEMU will generate block-access I/O requests, in a special blockdevice emulation module Linux Standard Kernel with KVM - Hypervisor Hardware

Multi-level I/O tagging modifications

Modification 1: pass priorities via syscalls
We support two types of priorities via system calls, At file descriptor level via fcntl At read/write level via pread_p

Modification 2: NOOP+ at guest I/O scheduler
Next modification what we made is at the I/O scheduler at the guest. We call it ‘No-op+’ because apart from being a No-op it has extra logic to get the priorities from block io and then pass them down to the scsi driver layer. Salini will explain in detail regarding the rationale behind using No-Op+.

Modification 3: extend SCSI protocol with prio
Now that we have the tag in the scsi layer of the guest, we need to pass it down to the hypervisor. According to the scsi specification, there is a un-used byte in SCSI command descriptor block for READ_10 and we leverage that. By editing the scsi driver in guest, we get the priority from IO request and set it in the SCSI command. After all the interactions in Qemu I/O stack that I mentioned earlier, This tag will be recovered in qemu and then we again pass it down to the hypervisor’s IO scheduler via pread_p systemcall. Salini will now take over the scheduling algorithms and evaluation.

Modification 2: NOOP+ at guest I/O scheduler

Modification 4: share-based prio sched in host

Modification 5: use new calls in benchmarks

Scheduler algorithm-Stride
𝐼 𝐷 𝑖 ID of application 𝐴 𝑖 𝑆ℎ𝑎𝑟 𝑒 𝑖 = Shares assigned to 𝐼 𝐷 𝑖 V𝐼 𝑂 𝑖 – Virtual IO counter for 𝐼 𝐷 𝑖 𝑆𝑡𝑟𝑖𝑑 𝑒 𝑖 = Global_shares/ 𝑆ℎ𝑎𝑟𝑒 𝑠 𝑖 Dispatch request() { Select the ID 𝑘 which has lowest Virtual IO counter Increase 𝑉𝐼𝑂 𝑘 by 𝑆𝑡𝑟𝑖𝑑𝑒 𝑘 if ( 𝑉𝐼𝑂 𝑘 reaches threshold) Reinitialize all 𝑉𝐼𝑂 𝑖 to 0 Dispatch request in the queue 𝑘 }

Scheduler algorithm cntd
Problem: Sleeping process can monopolize the resource once it wakes up after a long time Solution: If a sleeping process k wakes up, then set 𝑉𝐼𝑂 𝑘 = max( min(all 𝑉𝐼𝑂 𝑖 which are non zero), 𝑉𝐼𝑂 𝑘 )

Evaluation Tested on HDD and SSD Configuration: Guest RAM size 1GB
Host RAM size 8GB Hard disk RPM 7200 SSD 35000 IOPS Rd, IOPS Wr Guest OS Ubuntu Server LK 3.2 Host OS Kubuntu LK 3.2 Filesystem(Host/Guest) Ext4 Virtual disk image format qcow2

Results Metrics: Benchmarks: Throughput Latency Filebench Sysbench
Voldemort(Distributed Key Value Store)

Shares vs Throughput for different workloads : HDD

Shares vs Latency for different workloads : HDD
Priorities are better respected if most of the read request hits the disk

Effective Throughput for various dispatch numbers : HDD
Priorities are respected only when dispatch numbers of the disk is lower than the number of read requests generated by the system at a time Downside: Dispatch number of the disk is directly proportional to the effective throughput

Shares vs Throughput for different workloads : SSD

Shares vs Latency for different workloads : SSD
Priorities in SSDs are respected only under heavy load, since SSDs are faster

Comparison b/w different schedulers
Only Noop+LKMS respects priority! (Has to be, since we did it)

Results Hard drive/SSD Webserver Mailserver Random Reads
Sequential Reads Voldemort DHT Reads Hard disk Flash

Summary It works!!! Preferential services are possible only when dispatch numbers of the disk is lower than the number of read requests generated by the system at a time But lower dispatch number reduces the effective throughput of the storage In SSD, preferential service is only possible under heavy load Scheduling at the lowermost layer yields better differentiated services

Future work Get it working for writes
Get evaluations on VMware ESX SIOC and compare with our results

Differentiated I/O services in virtualized environments

Similar presentations

Presentation on theme: "Differentiated I/O services in virtualized environments"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Differentiated I/O services in virtualized environments

Similar presentations

Presentation on theme: "Differentiated I/O services in virtualized environments"— Presentation transcript:

Similar presentations

About project

Feedback