Presentation on theme: "Reading Report Cost of VM Live Migration By William Voorsluys1, James Broberg1, Srikumar Venugopal2, and Rajkumar Buyya1 CLOUD 2009."— Presentation transcript:
Reading Report Cost of VM Live Migration By William Voorsluys1, James Broberg1, Srikumar Venugopal2, and Rajkumar Buyya1 CLOUD 2009
the paper’s goal Migration overhead is acceptable but cannot be disregarded, especially when SLA is strict. The paper gives a performance evaluation of live migration.
Related Works(1) Multicore [8,9], paravirtualization, hardware-assisted virtualization , live migration Individual measurement of VM runtime overhead imposed by hypervisors on a variety of workloads [1, 11, 12] the impact of consolidating several applications on a single server running Xen
Related Works(2) performance degradation when migrating CPU and memory intensive workloads as well as migrating multiple VMs at the same time in a stop-and-copy way quantify its effects on a set of four applications common to hosting environments, primarily focusing on quantifying downtime and total migration time and demonstrating the viability of live migration. They don’t evaluated the effect of migration in the performance of modern Internet workloads, such as multi-tier and social network oriented applications.
Related Works evaluate the efficacy of migrating VMs across long distances, such as over the Internet the vConsolidate benchmark  a Web server, a database server a Java server, a mail server, an idle server. The Cloudstone benchmark  aims at computing the monetary cost, in dollars/user/month, for hosting Web 2.0 applications in cloud computing platforms
Background: advantage of live migration Live (or hot) migration : Hypervisors allow migrating an OS as it continues to run. Stop-and-copy (or cold)migration : halting the VM,copying all its memory pages to the destination then restarting the new VM. The advantage of live migration : the possibility to migrate an OS with near-0 downtime.
Background: characteristic of modern Internet application Highly dynamic and interactive features make Web2.0 apps explode. Social networking features make each user’s actions affect many other users, which makes static load partitioning unsuitable as a scaling strategy. By means of blogs, photostreams and tagging, users now publish content to one another rather than just consuming the static content.
Testbed specifications A cluster composed of 6 servers,1 head-node and 5 VM hosts. Each equipped with Intel Xeon E5410 (2.33 GHz Quad-core with 2x6MB L2 cache and Intel VT technology), 4GB memory, 7200rpm hard drive, connected through a Gigabit Ethernet switch. head-node : Ubuntu Server 7.10 with no hypervisor. other nodes : Citrix XenServer Enterprise Edition All VMs run 64-bit Ubuntu Linux 8.04 Server Edition, paravirtualized kernel version The installed web server is Apache running in prefork mode. PHP version is MySQL, with Innodb engine, is version
WorkLoad Olio as a Web2.0 application, combined with Faban load generator Olio’s PHP implementation, employing the popular LMAP stack(Linux Apache MySQL PHP) The Olio/Faban was originally proposed as part of the CloudStone benchmark The main metric : Service Level Agreement defined in Cloudstone.
Cloudstone's SLA Table 1. : The 90th/99th percentile of response times measured in any 5- minute window during steady state should not excess the following values (in seconds)
Benchmarking architecture(2) MySQL tends to be CPU-bound when serving the Olio database Apache/PHP tends to be memory-bound  All nodes share an NFS (Network File System) mounted storage device, which resides in the head-node and stores VM images and virtual disks. A local virtual disk is hosted in the server that hosts MySQL. The load is driven from the head-node, where the multi- threaded workload drivers run, along with Faban's master component.
Experimental objective To quantify slowdown and downtime experienced by the application when VM migrations are performed in the middle of a run. In a series of runs did not consist of migrating a VM back and forth between the same two machines.
Preliminary experiments To define exact VM sizes. Without migration Driven load against Olio and gradually increase the number of concurrent users. By analyzing the SLA, they found 600 is the max concurrent users. By memory and CPU usage, they found the min VM sizes serving 600 users could be : VM hosting Apache/PHP1 vCPU 2GB memory VM hosting MySQL 2 vCPU 1GB memory Host SQL on NFS can only support 400 users. So the experiment would not include database server migration.
Migration Experiment First set of experiment with Olio : 10-minute and 20 minute benchmark runs with 600 concurrent users. To evaluate how the SLA is violated when the system is nearly oversubscribed but not overloaded and also quantify the downtime when live migrations happen. Then, run the benchmark with smaller numbers of concurrent users, namely 100,200,..,500, searching for a “safe” level (lower risk of SLA violation).
Result and Discussion(1) Result shows that overhead due to live migration is acceptable but cannot be disregarded, especially in SLA- oriented environments equiring more demanding service levels.
Result and Discussion(2) Fig.2.Effects of a live migration on Olio's homepage loading activity
Result and Discussion(3) Figure 2 shows the effect of a single migration performed after five minutes in steady state of one run. A downtime of 3 seconds is experienced near the end of a 44 second migration. The highest peak observed in response times takes place immediately after the VM resumes in the destination node; 5 seconds elapse until the system can fully serve all requests that had initiated during downtime. In spite of that, no requests were dropped or timed out due to application downtime. The downtime experienced by Olio when serving 600 concurrent users is well above the expected millisecond level, previously reported in the literature for a range of workloads . This interesting result suggest that the workload complexity imposes a unusual memory
Result and Discussion(4) Fig th and 99th percentile SLA computed for the homepage loading response time with 600 concurrent users. The maximum allowed response time is 1 second
Result and Discussion(5) Figure 3 presents the effect of multiple migrations on the homepage loading response times. These result corresponds to the average of 5 runs. It is paramount that this information is employed by SLA- oriented VM-allocation mechanisms with the objective of reducing the risk of SLA non-compliance in situations when VM migrations are inevitable.
Result and Discussion(6)
Result and Discussion(7) Table 2 presents more detailed results listing maximum response times for all user actions as computed by the 99th percentile SLA formula when one migration was performed in the middle of a 10 minute run. These results indicate that a workload of 500 users is the load level at which a live migration of the Web server should be carried out (e.g. to a least loaded server) in order to decrease the risk of SLA violation.