Presentation is loading. Please wait.

Presentation is loading. Please wait.

Arne Wiebalck -- VM Performance: I/O

Similar presentations


Presentation on theme: "Arne Wiebalck -- VM Performance: I/O"— Presentation transcript:

1

2 Arne Wiebalck -- VM Performance: I/O
LHCb Computing Workshop, CERN May 22, 2015 Arne Wiebalck -- VM Performance: I/O

3 Arne Wiebalck -- VM Performance: I/O
In this talk: I/O only Most issues were I/O related Symptom: High IOwait You can optimize this Understand service offering Tune your VM CPU being looked at Host-side, too early Arne Wiebalck -- VM Performance: I/O

4 Arne Wiebalck -- VM Performance: I/O
Hypervisor limits Two-disk RAID-1 Effectively one disk Disk is a shared resource IOPS / #VMs Your VM is co-hosted with 7 other VMs 4 cores 8 GB RAM RAM CPUs Disk pro rata IOPS 25 IOPS virtual machine hypervisor User expectation: ephemeral disk ≈ local disk Arne Wiebalck -- VM Performance: I/O

5 Arne Wiebalck -- VM Performance: I/O
Ask not only what the VM can do for you … Arne Wiebalck -- VM Performance: I/O

6 Tip 1: Minimize disk usage
Use tmpfs Reduce logging Configure AFS memory caches Supported in Puppet Use volumes Arne Wiebalck -- VM Performance: I/O

7 Tip 2: Check IO Scheduling
The “lxplus problem” VMs use ‘deadline’ elevator Set by ‘virtual-guest’ tuned profile, RH’s default for VMs Not always ideal (interactive machines): ‘deadline’ prefers reads, can delay writes (default: 5 secs) Made to allow reads under heavy load (webserver) lxplus: sssd makes DB updates during login IO scheduler on the VM changed to CFQ Completely Fair Queuing Benchmark: login loop Arne Wiebalck -- VM Performance: I/O

8 ‘deadline’ vs. ‘CFQ’ IO benchmark starts Switch from ‘deadline’
to ‘CFQ’ IO benchmarks continues Arne Wiebalck -- VM Performance: I/O

9 Arne Wiebalck -- VM Performance: I/O
Tip 3: Use volumes Volumes are networked virtual disks Show up as block devices Attached to one VM at a time Arbitrary size (within your quota) Provided by Ceph (and NetApp) QoS for IOPS and bandwidth Allows to offer different types Arne Wiebalck -- VM Performance: I/O

10 Volumes types Name Bandwidth IOPS Comment standard 80MB/s 100 io1
std disk performance io1 120MB/s 500 quota on request cp1 critical power cp2 critical power Windows only (in preparation) Arne Wiebalck -- VM Performance: I/O

11 Arne Wiebalck -- VM Performance: I/O
Ceph volumes at work IOPS QoS changed from 100 to 500 IOPS ATLAS TDAQ monitoring application Y- Axis: CPU % spent in IOwait Blue: CVI VM (h/w RAID-10 with cache) Yellow: OpenStack VM EGI Message Broker monitoring Y- Axis: Scaled CPU load (5 mins of load / #cores) IOPS QoS changed from 100 to 500 IOPS Arne Wiebalck -- VM Performance: I/O

12 Tip 4: Use SSD block level caching
SSDs as disks in hypervisors would solve all IOPS and latency issues But still (too expensive and) too small Compromise: SSD block level caching flashcache (from Facebook, used at CERN for AFS before) dm-cache (in-kernel since 3.9, rec. by RedHat, in CentOS7) bcache (in kernel since 3.10) Arne Wiebalck -- VM Performance: I/O

13 Arne Wiebalck -- VM Performance: I/O
bcache Change cache mode at runtime Think SSD replacements Strong error-handling Flush and bypass on error Easy setup Transparent for VM Need special kernel for HV RAM RAM CPUs Disk 100 IOPS SSD 20k IOPS hypervisor Arne Wiebalck -- VM Performance: I/O

14 Switch cache mode from ‘none’
bcache in action (2) On a 4 VM hypervisor: ~25 IOPS/VM  ~1000 IOPS/VM Benchmarking a caching system is non-trivial: - SSD performance can vary over time - SSD performance can vary between runs - Data distribution important (c.f. Zipf) Switch cache mode from ‘none’ to ‘writeback’ Benchmark ended, #threads decreased Arne Wiebalck -- VM Performance: I/O

15 bcache and Latency Caveat: SSD failures are fatal!
SSD block level caching sufficient for IOPS and latency demands. Blue: CVI VM (h/w RAID-10 w/ cache) Yellow: OpenStack VM Red: OpenStack on bcache HV Use a VM on a bcache hypervisor Caveat: SSD failures are fatal! Clients: lxplus, ATLAS build service, CMS Frontier, root, … (16 tenants) Arne Wiebalck -- VM Performance: I/O

16 Arne Wiebalck -- VM Performance: I/O
“Tip” 5: KVM caching I/O from the VM goes directly to the disk Required for live migration Not optimal for performance I/O can be cached on the hypervisor Operationally difficult No live migration Done for batch 4 cores 8 GB RAM RAM CPUs Disk virtual machine hypervisor Arne Wiebalck -- VM Performance: I/O

17 Arne Wiebalck -- VM Performance: I/O
KVM caching in action ATLAS SAM VM (‘none’ to ‘write-back’) Batch nodes with and without KVM caching Arne Wiebalck -- VM Performance: I/O

18 Arne Wiebalck -- VM Performance: I/O
Take home messages The Cloud service offers various options to improve the I/O performance of your VMs You need to analyze and pick the right one for your use case Reduce I/O Check I/O scheduler Use volumes Use SSD hypervisors (Use KVM caching) Get in touch with the Cloud team in case you need assistance! Arne Wiebalck -- VM Performance: I/O

19


Download ppt "Arne Wiebalck -- VM Performance: I/O"

Similar presentations


Ads by Google