Xen Virtual Machine Monitor Performance Isolation E0397 Lecture 17/8/2010 Many slides based verbatim on “Xen Credit Scheduler Wiki”

Slides:



Advertisements
Similar presentations
Remus: High Availability via Asynchronous Virtual Machine Replication
Advertisements

Live migration of Virtual Machines Nour Stefan, SCPD.
Live Migration of Virtual Machines Presented by: Edward Armstrong University of Guelph.
Paging: Design Issues. Readings r Silbershatz et al: ,
G Robert Grimm New York University Virtual Memory.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Lecture 13: Main Memory (Chapter 8)
Live Migration of Virtual Machines Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hansen, Eric Jul, Christian Limpach, Ian Pratt, Andrew Warfield.
VSphere vs. Hyper-V Metron Performance Showdown. Objectives Architecture Available metrics Challenges in virtual environments Test environment and methods.
Live Migration of Virtual Machines Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hansen, Eric Jul, Christian Limpach, Ian Pratt, Andrew Warfield.
1 Cheriton School of Computer Science 2 Department of Computer Science RemusDB: Transparent High Availability for Database Systems Umar Farooq Minhas 1,
XENMON: QOS MONITORING AND PERFORMANCE PROFILING TOOL Diwaker Gupta, Rob Gardner, Ludmila Cherkasova 1.
1 Internet Networking Spring 2004 Tutorial 13 LSNAT - Load Sharing NAT (RFC 2391)
1 Chapter 8 Virtual Memory Virtual memory is a storage allocation scheme in which secondary memory can be addressed as though it were part of main memory.
ENFORCING PERFORMANCE ISOLATION ACROSS VIRTUAL MACHINES IN XEN Diwaker Gupta, Ludmila Cherkasova, Rob Gardner, Amin Vahdat Middleware '06 Proceedings of.
Computer Organization and Architecture
1 Threads Chapter 4 Reading: 4.1,4.4, Process Characteristics l Unit of resource ownership - process is allocated: n a virtual address space to.
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #12 LSNAT - Load Sharing NAT (RFC 2391)
Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access memory.
Xen and the Art of Virtualization. Introduction  Challenges to build virtual machines Performance isolation  Scheduling priority  Memory demand  Network.
Connecting LANs, Backbone Networks, and Virtual LANs
Presented by : Ran Koretzki. Basic Introduction What are VM’s ? What is migration ? What is Live migration ?
File Systems and N/W attached storage (NAS) | VTU NOTES | QUESTION PAPERS | NEWS | VTU RESULTS | FORUM | BOOKSPAR ANDROID APP.
Chapter 51 Threads Chapter 5. 2 Process Characteristics  Concept of Process has two facets.  A Process is: A Unit of resource ownership:  a virtual.
Real Security for Server Virtualization Rajiv Motwani 2 nd October 2010.
1 Scheduling I/O in Virtual Machine Monitors© 2008 Diego Ongaro Scheduling I/O in Virtual Machine Monitors Diego Ongaro, Alan L. Cox, and Scott Rixner.
1 Token Passing: IEEE802.5 standard  4 Mbps  maximum token holding time: 10 ms, limiting packet length  packet (token, data) format:  SD, ED mark start,
Microkernels, virtualization, exokernels Tutorial 1 – CSC469.
Review of Memory Management, Virtual Memory CS448.
Remus: VM Replication Jeff Chase Duke University.
Benefits: Increased server utilization Reduced IT TCO Improved IT agility.
Virtualization Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this presentation is licensed.
CSE 451: Operating Systems Section 10 Project 3 wrap-up, final exam review.
Token Passing: IEEE802.5 standard  4 Mbps  maximum token holding time: 10 ms, limiting packet length  packet (token, data) format:  SD, ED mark start,
The Functions of Operating Systems Interrupts. Learning Objectives Explain how interrupts are used to obtain processor time. Explain how processing of.
1 Xen and Co.: Communication-aware CPU Scheduling for Consolidated Xen-based Hosting Platforms Sriram Govindan, Arjun R Nath, Amitayu Das, Bhuvan Urgaonkar,
Xen (Virtual Machine Monitor) Operating systems laboratory Esmail asyabi- April 2015.
Live Migration of Virtual Machines
Live Migration of Virtual Machines Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hansen†,Eric Jul†, Christian Limpach, Ian Pratt, Andrew Warfield.
Nathanael Thompson and John Kelm
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Vidhya Sivasankaran.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 10: Virtual Memory Background Demand Paging Page Replacement Allocation of.
1 Memory Management Chapter 7. 2 Memory Management Subdividing memory to accommodate multiple processes Memory needs to be allocated to ensure a reasonable.
Processes CSCI 4534 Chapter 4. Introduction Early computer systems allowed one program to be executed at a time –The program had complete control of the.
Main Memory. Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Paging Structure of the Page Table Segmentation Example: The.
Paging (continued) & Caching CS-3013 A-term Paging (continued) & Caching CS-3013 Operating Systems A-term 2008 (Slides include materials from Modern.
Efficient Live Checkpointing Mechanisms for computation and memory-intensive VMs in a data center Kasidit Chanchio Vasabilab Dept of Computer Science,
Full and Para Virtualization
1 Lecture 1: Computer System Structures We go over the aspects of computer architecture relevant to OS design  overview  input and output (I/O) organization.
NETW3005 Virtual Memory. Reading For this lecture, you should have read Chapter 9 (Sections 1-7). NETW3005 (Operating Systems) Lecture 08 - Virtual Memory2.
Memory Resource Management in VMware ESX Server By Carl A. Waldspurger Presented by Clyde Byrd III (some slides adapted from C. Waldspurger) EECS 582 –
Live Migration of Virtual Machines Authors: Christopher Clark, Keir Fraser, Steven Hand, Jacob Gorm Hansen, Eric Jul, Christian Limpach, Ian Pratt, Andrew.
Background Computer System Architectures Computer System Software.
Capacity Planning in a Virtual Environment Chris Chesley, Sr. Systems Engineer
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
1 PERFORMANCE DIFFERENTIATION OF NETWORK I/O in XEN by Kuriakose Mathew ( )‏ under the supervision of Prof. Purushottam Kulkarni and Prof. Varsha.
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
Memory Management.
Xen and the Art of Virtualization
Processes and threads.
Process Management Process Concept Why only the global variables?
Copyright ©: Nahrstedt, Angrave, Abdelzaher
Chapter 9: Virtual Memory
Comparison of the Three CPU Schedulers in Xen
Chapter 9: Virtual-Memory Management
Threads Chapter 4.
Persistence: hard disk drive
Virtual Memory: Working Sets
Xen and the Art of Virtualization
System Virtualization
Xing Pu21 Ling Liu1 Yiduo Mei31 Sankaran Sivathanu1 Younggyun Koh1
Presentation transcript:

Xen Virtual Machine Monitor Performance Isolation E0397 Lecture 17/8/2010 Many slides based verbatim on “Xen Credit Scheduler Wiki”

Recall: Xen Architecture

Hypervisor Core Functions  Scheduling Domains  Allocating Memory  Driver Domain (Domain 0) –All access to devices needs to go through this domain

CPU Sharing between domains Credit Scheduler  Proportional fair share CPU scheduler - work conserving on SMP hosts  Definitions: –Each domain (including Host OS) is assigned a weight and a cap. –Weight –A domain with a weight of 512 will get twice as much CPU as a domain with a weight of 256 on a contended host. Legal weights range from 1 to and the default is 256. –Cap –The cap optionally fixes the maximum amount of CPU a domain will be able to consume, even if the host system has idle CPU cycles. The cap is expressed in percentage of one physical CPU: 100 is 1 physical CPU, 50 is half a CPU, 400 is 4 CPUs, etc... The default, 0, means there is no upper cap. –VCPUs: Number of virtual CPUs given to a domain – exactly replaces the concept of number of CPUs in physical machine  More VCPUs- can run threads parallel (only if physical CPUs > 1)  Number of VCPUs should be = Number of phy CPUs  < will limit performance

….Credit Scheduler –SMP load balancing  The credit scheduler automatically load balances guest VCPUs across all available physical CPUs on an SMP host. The administrator does not need to manually pin VCPUs to load balance the system. However, she can restrict which CPUs a particular VCPU may run on using the generic vcpu-pin interface.

Usage The xm sched-credit command may be used to tune the per VM guest scheduler parameters. xm sched-credit -d lists weight and cap xm sched-credit -d -w sets the weight xm sched-credit -d -c sets the cap

Credit Scheduler Algorithm  Each CPU manages a local run queue of runnable VCPUs. –queue is sorted by VCPU priority. –A VCPU's priority can be one of two value: over or under  Over: exceeded its fair share of CPU resource in the ongoing accounting period.  Under: not exceeded –When inserting a VCPU onto a run queue, it is put after all other VCPUs of equal priority to it.  As a VCPU runs, it consumes credits. – Every so often (10ms), 100 credits are reduced –Unless negative, VPU gets three rounds to run (30ms) – Negative credits imply a priority of over. Until a VCPU consumes its alloted credits, it priority is under. –Credits refreshed periodically (30 ms)  Active vs Inactive VM – –Credits reduction/refreshing happens only for active VMs (those that keep using their credits) –Those not using their credits are marked “inactive”

…Credit Schdeuler Algorithm  On each CPU, at every scheduling decision (when a VCPU blocks, yields, completes its time slice, or is awaken), the next VCPU to run is picked off the head of the run queue (UNDER).  When a CPU doesn't find a VCPU of priority under on its local run queue, it will look on other CPUs for one. This load balancing guarantees each VM receives its fair share of CPU resources system-wide.  Before a CPU goes idle, it will look on other CPUs to find any runnable VCPU. This guarantees that no CPU idles when there is runnable work in the system.

O U U U U …Credit Scheduler O O U U U O U U U V 4 V 3 V 2 V 1 V 4 V 5 V 3 V 2 V 1 V 5 V 4 V 3 V 2 V 1 V5V5 CPU 1 V5 state ? UNDER OVER end of time slice / domain blocked on IO credit calculation for V 5 Positive creditsNegative credits

…Credit Scheduler  SMP load balancing – find next runnable VCPU in following order. –UNDER domain in local run queue –UNDER domain in run queue of other CPUs –OVER domain in local run queue –OVER domain in run queue of other CPUs

Performance Studies  CPU Sharing is predictable  I/O sharing is not  See:

Study 1 Cherkasova paper. Applications running in Domain 1: (one of)  Web server: requests for 10KB sized files –Measure: requests/sec  iperf: measures maximum achievable network throughput, –Mbps  dd: reads KB blocks –Disk throughput: MB/s

Schedulers with three workloads Performance varies between schedulers Sensitive to Dom0 weight Disk reads least sensitive to both schedulers and Dom0 weight

Study 2: Ongarro paper: Experiment 3: Burn x 7, ping x 1 Processor sharing fair for all schedulers SEDF provides low latency Boost, sort and ticking together work best

Study 3: Sujesha et al (IITB) Impact of Resource Allocation on App performance Response time vs resource (CPU) allocated, for a different loads Desired operation: at the “knee” of the curve.

Study 4 (Gundecha/Apte): Performance of multiple domains vs scheduler parameters  Is resource management fair to every domain irrespective of – type of load – scheduling parameters of the domain  Scheduler used – default credit scheduler  Two virtual machines one with CPU-intensive and other with IO-intensive workload (Apache webserver).

Results Scheduler parameters webserver statistics VMweightCAPLOAD %CPU usage requests per sec time per request transfer rate (Kbytes/sec) Experiment 1 : one VM with webserver running vm NA vm webserver Experiment 2 : Mixed Load - one VM with CPU load, one VM with webserver vm CPU100NA vm webserver Table 1 : Webserver performance stats with and without CPU load on other VM

Study 4: Conclusions  Performance isolation is imperfect, when high-level performance measures are considered (application throughput)

Live Migration in Xen VMM

Live Migration: Advantages  Avoids difficulties of process migration –“Residual dependencies” – original host has to remain available and network-connected to service some calls from migrated OS*  In-memory state can be transferred consistently –TCP state –“this means that we can migrate an on-line game server or streaming media server without requiring clients to reconnect:....” –“Allows a separation of concerns between the users and operator of a data center or cluster. Users need not provide the operator with any OS-level access at all (e.g. a root login to quiesce processes or I/O prior to migration). “

*Residual dependencies “The problem of the residual dependencies that a migrated process retains on the machine from which it migrated. Examples of residual dependencies include open file descriptors, shared memory segments, and other local resources. These are undesirable because the original machine must remain available, and because they usually negatively impact the performance of migrated processes.” “The problem of the residual dependencies that a migrated process retains on the machine from which it migrated. Examples of residual dependencies include open file descriptors, shared memory segments, and other local resources. These are undesirable because the original machine must remain available, and because they usually negatively impact the performance of migrated processes.”

In summary: Live migration is  extremely powerful tool for cluster administrators – allowing separation of hardware and software considerations – consolidating clustered hardware into a single coherent management domain. –If host needs to be removed from service  Move guest domains, shutdown machine –Relieve bottlenecks  If host is overloaded, move guest domains to idle hosts “virtualization + live migration = improved manageability”

Goals of live migration  Reduce time during which services are totally unavailable  Reduce total migration time – time during which synch is happening (which might be unreliable)  Migration should not create unnecessary resource contention (CPU, disk, network, etc)

Xen live migration highlights  SPECweb benchmark migrated with 210ms unavailability  Quake 3 server migrated with 60ms downtime  Can maintain network connections and application state  Seamless migration

Xen live migration: Key ideas  Pre-copy approach –Pages of memory iteratively copied without stopping the VM –Page-level protection hardware used to ensure consistent snapshot –Rate adaptive algorithm to control impact of migration traffic –VM pauses only in final copy phase

Details: Time definitions  Downtime: period during which the service is unavailable due to there being no currently executing instance of the VM; this period will be directly visible to clients of the VM as service interruption.  Total migration time: duration between when migration is initiated and when the original VM may be finally discarded and hence, the source host may potentially be taken down for maintenance, upgrade or repair.

Memory transfer phases  Push –VM keeps sending pages from source to dest while running (modified pages have to be resent)  Stop-and-Copy –Halt VM- copy entire image, start VM on dest  Pull –Start VM on dest. Pages that are not found locally are retrieved from source

Migration approaches  Combinations of phases –Pure stop-and-copy:  Downtime & total migration time proportional to phy memory size (we want downtime to be lesser) –Pure demand-migration  Move some minimal data required, start VM on dest.  VM will page-fault a lot initially – total migration time will be long, and performance may be unacceptable –This paper: “Bounded push phase followed by stop-and-copy”

Xen “pre-copy” key ideas  Pre-copying (push) happens in rounds –In round n pages that changed in round n-1 are copied –Knowledge of pages which are written to frequently – poor candidates for pre-copy  “Writable working set “ (WWS) analysis for server workloads  Short stop-and-copy phase  Awareness of impact on actual services –Control resources used by migration (e.g. network bandwidth used, CPU used etc.)

Decoupling local resources  Network: Assume source, dest VM on same LAN –Migrating VM will move with TCP/IP state and IP address –Generate unsolicited ARP reply – ARP reply includes IP address mapping to MAC address. All receiving hosts will update their mapping  Few packets in transit may be lost, very minimal impact expected

… Decoupling local resources  Network…Problem: Routers may not accept broadcast ARP replies (note: ARP requests are expected to be broadcast – not replies) –Send ARP replies to addresses in its own ARP cache –On a switched network migrating OS keeps its original Ethernet mac – depends on network switch to detect the mac moved to a new port

… Decoupling local resources  Disk –Assume Network Attached Storage –Source/Dest machine on the same storage network –Problem not directly addressed

Migration Steps If not enough resources, no migration At end of this phase, two consistent copies of VM OK Message from B  A and ack from A  B. VM on A can stop Failure management: One consistent VM active at all times. If “commit” step does not happen, migration aborted and original VM continues

Writable Working Sets  Copying VM memory the largest migration overhead  Stop-and-copy simplest approach –Downtime unacceptable  Pre-copy migration may reduce downtime –What about pages that are dirtied while copying? –What if rate of dirtying > rate of copying?  Key observation: most VMs will be such that large parts of memory are not modified, small part is (called WWS)

WWS Measurements Different for different workloads

Estimated Effect on downtime Each successive line (top  bottom) showing increasing pre-copy rounds Pre-copy reduces downtime (confirmed by many experiments)

Details: Managed Migration (Migration by Domain 0)  First round: All pages copied  Subsequent rounds – only dirtied pages copied. (maintain dirty bitmap). How: –Insert shadow page tables –Populated by translating sections of guest OS page tables  Page table entries are initially read-only mappings  If guest tries to modify a page  fault created/trapped by Xen  If write access permitted by original page table, do same here – and set bit in dirty bitmap –At start of each pre-copy round, dirty bitmap given to control software, Xens bitmap cleared, shadow page tables destroyed and recreated, all write permissions lost  When pre-copy is to stop, mesg sent to guest OS to suspend itself, dirty bitmap checked, pages synched.

Self-Migration  Major difficulty: OS itself has to run to transfer itself – what will be correct state to transfer? Soln. –Suspend all activities except migration-related –Dirty page scan –Copy dirty pages to “shadow buffer” –Transfer shadow buffer  Page dirtying during this time is ignored.

Dynamic Rate-Limiting Algorithm  Select minimum and maximum bandwidth limits  First pre-copy round transfers pages minimum b/w  Dirty page rate calculates (number/duration of round)  B/w limit of next round = dirty rate + 50 Mbit/sec (higher b/w if dirty rate high)  Terminate when calculated rate > max or less than 256KB remaining  Stop-and-copy done at max bandwitdh

Live Migration: Some results

…Results  Downtime: 210 ms  Total migration time: 71 secs  No perceptible impact on performance during uptime