Diagnosing Performance Overheads in the Xen Virtual Machine Environment Aravind Menon Willy Zwaenepoel EPFL, Lausanne Jose Renato Santos Yoshio Turner.

Slides:



Advertisements
Similar presentations
Remus: High Availability via Asynchronous Virtual Machine Replication
Advertisements

Debugging operating systems with time-traveling virtual machines Sam King George Dunlap Peter Chen CoVirt Project, University of Michigan.
Redesigning Xen Memory Sharing (Grant) Mechanism Kaushik Kumar Ram (Rice University) Jose Renato Santos (HP Labs) Yoshio Turner (HP Labs) Alan L. Cox (Rice.
Virtualization Technology
Virtual Switching Without a Hypervisor for a More Secure Cloud Xin Jin Princeton University Joint work with Eric Keller(UPenn) and Jennifer Rexford(Princeton)
Differentiated I/O services in virtualized environments
XEN AND THE ART OF VIRTUALIZATION Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, lan Pratt, Andrew Warfield.
Embedded System Lab. Yoon Jun Kee Xen and the Art of Virtualization.
Bart Miller. Outline Definition and goals Paravirtualization System Architecture The Virtual Machine Interface Memory Management CPU Device I/O Network,
XENMON: QOS MONITORING AND PERFORMANCE PROFILING TOOL Diwaker Gupta, Rob Gardner, Ludmila Cherkasova 1.
1 Ally: OS-Transparent Packet Inspection Using Sequestered Cores Jen-Cheng Huang 1, Matteo Monchiero 2, Yoshio Turner 3, Hsien-Hsin Lee 1 1 Georgia Tech.
© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice In search of a virtual yardstick:
Xen and the Art of Virtualization A paper from the University of Cambridge, presented by Charlie Schluting For CS533 at Portland State University.
Xen and the Art of Virtualization Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, Andrew Warfield.
Hosted VMM Architecture Advantages: –Installs and runs like an application –Portable – host OS does I/O access –Coexists with applications running on.
Operating System Support for Virtual Machines Samuel King, George Dunlap, Peter Chen Univ of Michigan Ashish Gupta.
Jiang Wang, Joint work with Angelos Stavrou and Anup Ghosh CSIS, George Mason University HyperCheck: a Hardware Assisted Integrity Monitor.
Virtualization for Cloud Computing
Xen and the Art of Virtualization Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, Andrew Warfield.
Xen and the Art of Virtualization. Introduction  Challenges to build virtual machines Performance isolation  Scheduling priority  Memory demand  Network.
Yury Kissin Infrastructure Consultant Storage improvements Dynamic Memory Hyper-V Replica VM Mobility New and Improved Networking Capabilities.
Zen and the Art of Virtualization Paul Barham, et al. University of Cambridge, Microsoft Research Cambridge Published by ACM SOSP’03 Presented by Tina.
Virtual Machines Xen and Terra Rajan Palanivel. Xen and Terra : Papers Xen and the art of virtualization. -Univ. of Cambridge Terra: A VM based platform.
Microkernels, virtualization, exokernels Tutorial 1 – CSC469.
Support for Smart NICs Ian Pratt. Outline Xen I/O Overview –Why network I/O is harder than block Smart NIC taxonomy –How Xen can exploit them Enhancing.
© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Achieving 10 Gb/s Using Xen Para-virtualized.
Virtualization The XEN Approach. Virtualization 2 CS5204 – Operating Systems XEN: paravirtualization References and Sources Paul Barham, et.al., “Xen.
Benefits: Increased server utilization Reduced IT TCO Improved IT agility.
1 XenSocket: VM-to-VM IPC John Linwood Griffin Jagged Technology virtual machine inter-process communication Suzanne McIntosh, Pankaj Rohatgi, Xiaolan.
Xen I/O Overview. Xen is a popular open-source x86 virtual machine monitor – full-virtualization – para-virtualization para-virtualization as a more efficient.
Xen I/O Overview.
Improving Network I/O Virtualization for Cloud Computing.
Virtual Machine Monitors: Technology and Trends Jonathan Kaldor CS614 / F07.
Penn State CSE “Optimizing Network Virtualization in Xen” Aravind Menon, Alan L. Cox, Willy Zwaenepoel Presented by : Arjun R. Nath.
Xen and The Art of Virtualization Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt & Andrew Warfield.
Politecnico di Torino Dipartimento di Automatica ed Informatica TORSEC Group Performance of Xen’s Secured Virtual Networks Emanuele Cesena Paolo Carlo.
Hadi Salimi Distributed Systems Lab, School of Computer Engineering, Iran University of Science and Technology, Fall 2010 Performance.
Embedded System Lab. 오명훈 Memory Resource Management in VMware ESX Server Carl A. Waldspurger VMware, Inc. Palo Alto, CA USA
Our work on virtualization Chen Haogang, Wang Xiaolin {hchen, Institute of Network and Information Systems School of Electrical Engineering.
1 Xen and Co.: Communication-aware CPU Scheduling for Consolidated Xen-based Hosting Platforms Sriram Govindan, Arjun R Nath, Amitayu Das, Bhuvan Urgaonkar,
CS533 Concepts of Operating Systems Jonathan Walpole.
Nathanael Thompson and John Kelm
Srihari Makineni & Ravi Iyer Communications Technology Lab
© 2007 IBM Corporation IBM T. J. Watson Research Center Xen Summit 2007 | April 17, 2007 XenSocket Suzanne McIntosh Security, Privacy and Extensible Technologies.
 Virtual machine systems: simulators for multiple copies of a machine on itself.  Virtual machine (VM): the simulated machine.  Virtual machine monitor.
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Vidhya Sivasankaran.
Cloud Operating System Unit 09 Cloud OS Core Technology M. C. Chiang Department of Computer Science and Engineering National Sun Yat-sen University Kaohsiung,
VTurbo: Accelerating Virtual Machine I/O Processing Using Designated Turbo-Sliced Core Embedded Lab. Kim Sewoog Cong Xu, Sahan Gamage, Hui Lu, Ramana Kompella,
Intel Research & Development ETA: Experience with an IA processor as a Packet Processing Engine HP Labs Computer Systems Colloquium August 2003 Greg Regnier.
Introduction to virtualization
Full and Para Virtualization
Lecture 26 Virtual Machine Monitors. Virtual Machines Goal: run an guest OS over an host OS Who has done this? Why might it be useful? Examples: Vmware,
Operating-System Structures
OS Structures - Xen. Xen Key points Goal: extensibility akin to SPIN and Exokernel goals Main difference: support running several commodity operating.
CSE598c - Virtual Machines - Spring Diagnosing Performance Overheads in the Xen Virtual Machine EnvironmentPage 1 CSE 598c Virtual Machines “Diagnosing.
Extending Xen * with Intel ® Virtualization Technology Mobile Embedded System Choi, Jin-yong
© 2004 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Understanding Virtualization Overhead.
7/2/20161 Re-architecting VMMs for Multicore Systems: The Sidecore Approach Presented by: Sanjay Kumar PhD Candidate, Georgia Institute of Technology Co-Authors:
Virtual Machines (part 2) CPS210 Spring Papers  Xen and the Art of Virtualization  Paul Barham  ReVirt: Enabling Intrusion Analysis through Virtual.
Xen and the Art of Virtualization
Virtualization Technology
Is Virtualization ready for End-to-End Application Performance?
Presented by Yoon-Soo Lee
Xen: The Art of Virtualization
Xen Network I/O Performance Analysis and Opportunities for Improvement
Perfctr-Xen: A framework for Performance Counter Virtualization
CSE 451: Operating Systems Autumn Module 24 Virtual Machine Monitors
CSE 451: Operating Systems Autumn Module 24 Virtual Machine Monitors
System Virtualization
Xing Pu21 Ling Liu1 Yiduo Mei31 Sankaran Sivathanu1 Younggyun Koh1
Presentation transcript:

Diagnosing Performance Overheads in the Xen Virtual Machine Environment Aravind Menon Willy Zwaenepoel EPFL, Lausanne Jose Renato Santos Yoshio Turner Yoshio Turner G. (John) Janakiraman HP Labs, Palo Alto

Virtual Machine Monitors (VMM) Increasing adoption for server applications Increasing adoption for server applications Server consolidation, co-located hosting Server consolidation, co-located hosting Virtualization can affect application performance in unexpected ways Virtualization can affect application performance in unexpected ways

Web server performance in Xen 25-66% lower peak throughput than Linux depending on Xen configuration 25-66% lower peak throughput than Linux depending on Xen configuration Need VM-aware profiling to diagnose causes of performance degradation Need VM-aware profiling to diagnose causes of performance degradation

Contributions Xenoprof – framework for VM-aware profiling in Xen Xenoprof – framework for VM-aware profiling in Xen Understanding network virtualization overheads in Xen Understanding network virtualization overheads in Xen Debugging performance anomaly using Xenoprof Debugging performance anomaly using Xenoprof

Outline Motivation Motivation Xenoprof Xenoprof Network virtualization overheads in Xen Network virtualization overheads in Xen Debugging using Xenoprof Debugging using Xenoprof Conclusions Conclusions

Xenoprof – profiling for VMs Profile applications running in VM environments Profile applications running in VM environments Contribution of different domains (VMs) and the VMM (Xen) routines to execution cost Contribution of different domains (VMs) and the VMM (Xen) routines to execution cost Profile various hardware events Profile various hardware events Example output Example output Function name %Instructions Module Function name %Instructions Module mmu_update 13 Xen (VMM) mmu_update 13 Xen (VMM) br_handle_frame 8 driver domain (Dom 0) br_handle_frame 8 driver domain (Dom 0) tcp_v4_rcv 5 guest domain (Dom 1) tcp_v4_rcv 5 guest domain (Dom 1)

Xenoprof – architecture (brief) Extend existing profilers (OProfile) to use Xenoprof Extend existing profilers (OProfile) to use Xenoprof Xenoprof collects samples and coordinates profilers running in multiple domains Xenoprof collects samples and coordinates profilers running in multiple domains Domain 0 OProfile (extended) Xenoprof Domain 1 OProfile (extended) Domain 2 OProfile (extended) Domains (VMs) Xen VMM H/W performance counters

Outline Motivation Motivation Xenoprof Xenoprof Network virtualization overheads in Xen Network virtualization overheads in Xen Debugging using Xenoprof Debugging using Xenoprof Conclusions Conclusions

Xen network I/O architecture Privileged driver domain controls physical NIC Privileged driver domain controls physical NIC Each unprivileged guest domain uses virtual NIC connected to driver domain via Xen I/O Channel Each unprivileged guest domain uses virtual NIC connected to driver domain via Xen I/O Channel Control: I/O descriptor ring (shared memory) Control: I/O descriptor ring (shared memory) Data Transfer: Page remapping (no copying) Data Transfer: Page remapping (no copying) I/O Driver Domain Guest Domain I/O Channel NIC Bridge vif1vif2

Evaluated configurations Linux: no Xen Linux: no Xen Xen Driver: run application in privileged driver domain Xen Driver: run application in privileged driver domain Xen Guest: run application in unprivileged guest domain interfaced to driver domain via I/O channel Xen Guest: run application in unprivileged guest domain interfaced to driver domain via I/O channel I/O Driver Domain Guest Domain I/O Channel NIC Bridge vif1vif2

Networking micro-benchmark One streaming TCP connection per NIC (up to 4) One streaming TCP connection per NIC (up to 4) Driver receive throughput 75% of Linux throughput Driver receive throughput 75% of Linux throughput Guest throughput 1/3 rd to 1/5 th of Linux throughput Guest throughput 1/3 rd to 1/5 th of Linux throughput

Receive – Xen Driver overhead Profiling shows slower instruction execution with Xen Driver than w/Linux (both use 100% CPU) Profiling shows slower instruction execution with Xen Driver than w/Linux (both use 100% CPU) Data TLB miss count 13 times higher Data TLB miss count 13 times higher Instruction TLB miss count 17 times higher Instruction TLB miss count 17 times higher Xen: 11% more instructions per byte transferred (Xen virtual interrupts, driver hypercall) Xen: 11% more instructions per byte transferred (Xen virtual interrupts, driver hypercall)

Receive – Xen Guest overhead Xen Guest configuration executes two times as many instructions as Xen Driver configuration Xen Guest configuration executes two times as many instructions as Xen Driver configuration Driver domain (38%): overhead of bridging Driver domain (38%): overhead of bridging Xen (27%): overhead of page remapping Xen (27%): overhead of page remapping I/O Driver Domain Guest Domain I/O Channel NIC Bridge vif1vif2

Transmit – Xen Guest overhead Xen Guest: executes 6 times as many instructions as Xen driver configuration Xen Guest: executes 6 times as many instructions as Xen driver configuration Factor of 2 as in Receive case Factor of 2 as in Receive case Guest instructions increase 2.7 times Guest instructions increase 2.7 times Virtual NIC (vif2) in guest does not support TCP offload capabilities of NIC Virtual NIC (vif2) in guest does not support TCP offload capabilities of NIC

Suggestions for improving Xen Enable virtual NICs to utilize offload capabilities of physical NIC Enable virtual NICs to utilize offload capabilities of physical NIC Efficient support for packet demultiplexing in driver domain Efficient support for packet demultiplexing in driver domain

Outline Motivation Motivation Xenoprof Xenoprof Network virtualization overheads in Xen Network virtualization overheads in Xen Debugging using Xenoprof Debugging using Xenoprof Conclusions Conclusions

Anomalous network behavior in Xen TCP receive throughput in Xen changes with application buffer size (slow Pentium III) TCP receive throughput in Xen changes with application buffer size (slow Pentium III)

Debugging using Xenoprof 40% kernel execution overhead incurred in socket buffer de-fragmenting routines 40% kernel execution overhead incurred in socket buffer de-fragmenting routines

De-fragmenting socket buffers Linux: insignificant fragmentation with streaming workload Linux: insignificant fragmentation with streaming workload Socket receive queue De-fragment Socket buffer (4 KB) Data packet (MTU) Xenolinux (Linux on Xen) Xenolinux (Linux on Xen) Received packets: 1500 bytes (MTU) out of 4 KB socket buffer Received packets: 1500 bytes (MTU) out of 4 KB socket buffer Page-sized socket buffers support remapping over I/O channel Page-sized socket buffers support remapping over I/O channel

Conclusions Xenoprof useful for identifying major overheads in Xen Xenoprof useful for identifying major overheads in Xen Xenoprof to be included in official Xen and OProfile releases Xenoprof to be included in official Xen and OProfile releases Where to get it: Where to get it: