Black-box and Gray-box Strategies for Virtual Machine Migration Timothy Wood, Prashant Shenoy, Arun Venkataramani, and Mazin Yousif † Univ. of Massachusetts.

Black-box and Gray-box Strategies for Virtual Machine Migration Timothy Wood, Prashant Shenoy, Arun Venkataramani, and Mazin Yousif † Univ. of Massachusetts Amherst †Intel 4th USENIX Symposium on Networked Systems Design & Implementation4th USENIX Symposium on Networked Systems Design & Implementation (NSDI 2007) 1

Introduction Operate application in data center. – Effective management of data center resources while meeting SLAs Virtualization Benefit of Virtualization – Application isolation – Server consolidation(multiplexing) – Handle workload dynamics 2

Motivation Efficient data center resource management – Live Migration However, detecting workload hotspots and initiating a migration is currently handled manually – Lacks the agility to respond to sudden workload changes – Need consider multiple resource CPU, network, and memory 3

Solution Automated black-box and gray-box strategies for virtual machine migration (Sandpiper) – Monitoring system resource usage – Hotspot detection – Determining a new mapping – Initiating the necessary migrations 4

The Sandpiper Architecture 5 Gathering resource usage statistics on that server Gathers processor, network and memory swap statistics for each VM Gathers processor, network and memory swap statistics for each VM Implements a daemon to gather OS-level statistics and application logs Implements a daemon to gather OS-level statistics and application logs Construct resource usage profiles for each virtual server (Predict PM workload) Construct resource usage profiles for each virtual server (Predict PM workload) Monitors usage profiles to detect hotspots. Hotspot: any resource exceeds a threshold(or SLA violation) for a sustain period Monitors usage profiles to detect hotspots. Hotspot: any resource exceeds a threshold(or SLA violation) for a sustain period Determine: What virtual servers should migrate Where to move them How much of a resource to allocate the virtual servers after migration Determine: What virtual servers should migrate Where to move them How much of a resource to allocate the virtual servers after migration

Black-box monitoring(1/4) 6

Black-box monitoring(2/4) -CPU monitoring VM CPU usage can be determined by tracking scheduling events in the hypervisor. – Does not include VM’s disk IO and network CPU overhead. These kinds of overhead is count on Domain-0 Each VM is then charged: – domain-0’s CPU usage*(VM IO request/ total IO requests) Assumption: the monitoring engine and the nucleus overhead is negligible 7

Black-box monitoring(3/4) -Network monitoring Background: – Domain-0 in Xen implements the network interface driver – VMs access the driver via clean device abstractions(virtual firewall-router (VFR) interface) Monitoring engine can use the Linux /proc interface VNIC’s usage – /proc/net/dev 8

Black-box monitoring(4/4) -Memory monitoring Challenge: – Domain-0 cannot directly monitor each VM’s actual memory usage/utilization. Only know the amount of memory assigned to the VM. Solution: – Observing swap activity in Domain-0 can infer the working set sizes.[11] 9 [11] S. Jones, A. Arpaci-Dusseau, and R. Arpaci-Dusseau. Geiger: Monitoring the buffer cache in a virtual machine environment. In Proc. ASPLOS’06, pages 13–23, October 2006.

Gray-box monitoring Motivation: – Black-box monitoring is not feasible to “peek inside” a VM to gather usage statistics. Solution: – Install a light-weight monitoring daemon inside each virtual server – Use /proc interface to gather OS-level statistics CPU, network, memory – Application-level statistics Daemon get statistics from function provided by application itself E.g. web/database server: request rate, request drop rate, service time 10

Profile Generation(1/2) Profile: a compact description of that server’s resource usage over a sliding time window W. Profile content: – Blackbox parameter: CPU utilization, network bandwidth utilization, and memory swap rate – Graybox parameter: memory utilization, service time, request drop rate and incoming request rate. (assumption: web server- apache) 11

Profile Generation(2/2) Profile type: – Distribution profile: The probability distribution of the resource usage over the window W. – Time series profile: The temporal fluctuations and it is simply a list of all reported observations within the window W. 12

Hotspot detection 13

Resource Provisioning Goal: – Ensures that the SLAs are not violated even in the presence of peak workloads. Estimate the peak CPU, network and memory requirement of each overloaded VM Black-box provisioning Gray-box provisioning 14

Black-box provisioning(1/3) Estimation of peak CPU&Network bandwidth needs: – Distribution profile Use historical data to predict the peak. Challenge: Estimation error!! Background: – Both the CPU scheduler and the network packet scheduler in Xen are work-conserving. 15

Black-box provisioning(2/3) Estimation error: – Example: Two virtual machines that are assigned CPU weights of 1:1(50% of each) Assume that VM 1 is overloaded and requires 70% of the CPU to meet its peak needs. 16

Black-box provisioning(3/3) Solution of estimation error: – adds a constant Δ to scale up this estimate. Estimation of peak memory needs: – If swap activity exceeds the threshold. – Then the current allocation is deemed insufficient and is increased by a constant amount Δ m 17

Gray-box provisioning(1/3) The gray-box approach can access to application-level logs. – Ability to estimate the peak resource needs of the application even when the resource is fully utilized. Estimating peak CPU needs: – An application model is necessary to estimate the peak CPU needs. 18

Gray-box provisioning(2/3) 19 [23] B. Urgaonkar, P. Shenoy, A. Chandra, and P. Goyal. Dynamic provisioning for multi-tier internet applications. In Proc. ICAC ’05, June 2005. [13] L. Kleinrock. Queueing Systems, Volume 2: Computer Applications. John Wiley and Sons, Inc., 1976.

Gray-box provisioning(3/3) 20

Hotspot mitigation(1/3) Hotspot mitigation alg: – Goal: Determine which VM should be migrate to where to dissipate. – Challenge: NPHard--- multi-dimensional bin packing problem – Bin=physical server, dimension=resource constraints – Solution: A heuristic which solve: – Which overloaded VMs to migrate – Migrate to where such that migration overhead is minimized. » Migration overhead can not be neglect 21

Hotspot mitigation(2/3) Hotspot mitigation alg(cont.): – Intuition: Move load from the most overloaded servers to the least- loaded servers, minimize data copying incurred during migration – Volume: the degree of load along multiple dimensions in a unified fashion. where cpu, net and mem are the corresponding utilizations of that resource for the virtual or physical server 22

Hotspot mitigation(3/3) Hotspot mitigation alg(cont.): – volume-to-size ratio (VSR): Volume/Size(Size=the memory size of the VM) – Migration decision: Move highest VSR VM from the highest volume server and determines if it can be housed on the least volume physical server. – Swap decision(only consider 2-way swap): Activate when simple migration cannot solve hotspot. Swap the highest VSR VM on the highest volume hotspot server with k lowest VSR VMs in lowest volume server – If a swap cannot be found, the next least loaded server is considered Note: a swap may require a third server(RAM issue) 23

Implementation 24

Evaluation Environment Data center: – 20 server(2.4Ghz pentium-4 servers) – Connected with gigabit ethernet – At least 1GB ram OS – Linux 2.6.16+Xen 3.0.2-3 Workload generator – A cluster of Pentium-3 Linux servers 25

Experiment 1 uses 3 physical servers and 5 VMs with memory allocations as following. All VMs run Apache serving dynamic PHP web pages. Use httperf to inject a workload Experiment 1- Migration Effectiveness 26

Experiment 1- Migration Effectiveness(cont.) 27 t=166,Hotspot detected, VM1 has highest VSR PM3 has lowest volume t=166,Hotspot detected, VM1 has highest VSR PM3 has lowest volume t=362,Hotspot detected, VM4 has 2-nd highest VSR (no PM has enough capacity to host VM3) PM1 has lowest volume t=362,Hotspot detected, VM4 has 2-nd highest VSR (no PM has enough capacity to host VM3) PM1 has lowest volume VM3 In final phase VM1 and VM5 the same Volume But VM5 use smaller memory PM2 has lowest volume In final phase VM1 and VM5 the same Volume But VM5 use smaller memory PM2 has lowest volume

Experiment 2 - Virtual Machine Swaps VM IDRAM(MB)Host MachineCPU Load type VM1384PM1Steadily increase VM2384PM1Constant VM3256PM2Constant VM4256PM2Constant 28 Experiment setting: As before, clients use httperf to request dynamic PHP pages.

Experiment 2 - Virtual Machine Swaps(cont.) 29 Hotspot detected on PM1. The only viable solution is to swap VM2 with VM4. (3 party swap) VM4 use smallest memory, so it is migrated twice. Hotspot detected on PM1. The only viable solution is to swap VM2 with VM4. (3 party swap) VM4 use smallest memory, so it is migrated twice. Migration of VM4 is completed, VM2 start to be migrated to PM2. Migration of VM2 is completed, VM4 start to be migrated to PM1. Migration overhead

Experiment 3 - Mixed resource workloads Experiment setting: VM2 is database that stores its table in memory PM2 has more physical memory 30 VM IDDescriptionHost Machine RAM(MB) VM1Network intensivePM1256 VM2Network intensive+ memory grow over time PM1256 VM3CPU intensivePM2256 VM4CPU intensivePM2256

Experiment 3 - Mixed resource workloads(cont.) PM1 has a network hotspot and PM2 has a CPU hotspot Sandpiper swaps a network intensive VM for a CPU-intensive VM at t=130 31

Experiment 3 - Mixed resource workloads(cont.) Sandpiper responds by increasing the RAM allocation in steps of 32MB every time swapping is observed; When no additional RAM is available, the VM is swapped to the second physical server at t=430. – Swap two Network-intensive VM(VM1 and VM2) 32

Experiment 4 - Gray v. Black: Memory Allocation Goal: – Compare the effectiveness of the black- and graybox approaches in mitigating memory hotspots Using the SPECjbb 2005 benchmark generate memory usage. Settings: 33 Host Machi ne PM RAM(MB)VM IDVM RAM(MB)Description PM1384VM1256Gray-box strategy PM2384VM2256Black-box strategy PM31024none Idle server (Wait for migration)

Experiment 4- Gray v.s. Black: Memory Allocation(cont.) Experiment Result: Observation: – The gray-box system can reduce or eliminate swapping without significant overprovisioning of memory. 34

Experiment 4 - Gray v.s. Black: Apache Performance Settings: We use httperf to generate requests for CPU intensive PHP scripts on all VMs. 35 Host Machi ne Hosting VM Actual VM CPU requirement PM1VM1~270% PM1VM333% PM2VM47% PM3noneNone

Experiment 4 - Gray v.s. Black: Apache Performance Black-box strategy error guess: 36 12 34

Compare Gray-box strategy with Black-box strategy: Gray-box strategy can migrate VM3 to PM2 and VM1 to PM3 concurrently Experiment 4 - Gray v.s. Black: Apache Performance 37

Experiment 5 -Prototype Data Center Evaluation Data Center environment – 16 servers that run a total of 35 VMs. – 1 additional server runs the control plane – 1 additional server is reserved as a scratch node for swaps. Settings: – Six physical servers running a total of 14 VMs to be overloaded four servers see a CPU hotspot and two see a network hotspot 38

Experiment 5 -Prototype Data Center Evaluation Result: – Sandpiper eliminates hotspots on all six servers by interval 60. 39 Migration overhead

Sandpiper overhead and scalability Sandpiper’s CPU and network overhead: – depends on the number of PMs and VMs in the data center. – Overhead of Graybox strategy may affected by the size of application-level statistics gathered 40

Sandpiper overhead and scalability(cont.) Nucleus overhead: – Network: Each report uses only 288 bytes per VM The resulting overhead on a gigabit LAN is negligible – CPU usage: Compare the performance of a CPU benchmark with and without our resource monitors running. – On a single physical server running 24 concurrent VMs, Nucleus overheads reduce the CPU benchmark by approximately 1%. 41

Sandpiper overhead and scalability(cont.) Control Plane Scalability: – Source of computation complexity Computation of a new mapping of virtual machines to physical servers after detecting a hotspot 42

Conclusion&future work In this paper, we proposed Sandpiper, a automatic system which can: – monitoring and detecting hotspots – determining a new mapping of physical to virtual resources – initiating the necessary migrations We discussed a blackbox strategy and graybox strategy. Evaluation showed we can bring rapid hotspot elimination in data center environments. Future work: – Support replicated services automatically determining whether to migrate a VM or to spawn a replica. 43

Comment Advantage: – Good point to separate the monitoring strategy in blackbox and graybox. – Sandpiper’s architecture and strategy may fit our “Plan A” Shortage: – The relationship of CPU utilization and request rate may not be linear – The hotspot mitigation algorithm only consider average the workload between physical machine Should consider how to make PM get highest utilization without hotspot 44

Black-box and Gray-box Strategies for Virtual Machine Migration Timothy Wood, Prashant Shenoy, Arun Venkataramani, and Mazin Yousif † Univ. of Massachusetts.

Similar presentations

Presentation on theme: "Black-box and Gray-box Strategies for Virtual Machine Migration Timothy Wood, Prashant Shenoy, Arun Venkataramani, and Mazin Yousif † Univ. of Massachusetts."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Black-box and Gray-box Strategies for Virtual Machine Migration Timothy Wood, Prashant Shenoy, Arun Venkataramani, and Mazin Yousif † Univ. of Massachusetts.

Similar presentations

Presentation on theme: "Black-box and Gray-box Strategies for Virtual Machine Migration Timothy Wood, Prashant Shenoy, Arun Venkataramani, and Mazin Yousif † Univ. of Massachusetts."— Presentation transcript:

Similar presentations

About project

Feedback