Presentation on theme: "Alexander Stage and Thomas Setzer Technische Universit¨at M¨unchen (TUM) Chair for Internet-based Information Systems ICSE Workshop on Software Engineering."— Presentation transcript:
Alexander Stage and Thomas Setzer Technische Universit¨at M¨unchen (TUM) Chair for Internet-based Information Systems ICSE Workshop on Software Engineering Challenges in Cloud Computing, Vancouver, Canada, May 2009 1
Server virtualization based workload consolidation is increasingly used. Raise server utilization levels Ensure cost-efficient data center operations. Unforeseen spikes or shifts in workloads require dynamic workload management to avoid server overload. Continuously align placements of virtual machines (VMs) ----VM Live Migration 2
Phase 1: Setting Create a TCP connection between source and destination Copy VMs profile to destination Create a VM on destination 3.BIN.VSV.XML.VHD Source Node (Host A) Destination Node (Host B) Network Storage Configuration Data
Phase 2: Memory migrate Transfer Memory to destination Trace the difference when transferring Memory Pause the VM on Source Node when starting last transfer 4.BIN.VSV.XML.VHD Source Node Destination Node Network Storage Memory Content
Phase 3: Status migrate Migrate register in VM in Source Node Starting the VM in Destination Node Clean old VM in Source Node 5.BIN.VSV.XML.VHD Source NodeDestination Node Network Storage Running State
VM live migration realizes: Dynamic resource provisioning Load balancing But it imposes significant overheads that need to be considered and controlled. CPU overhead  Network overhead and network topology 6  T. Wood, P. Shenoy, A. Venkataramani, and M. Yousif. Black-box and gray-box strategies for virtual machine migration. In 4th USENIX Symp. on Networked Systems Design and Impl., pages 229 – 242, 2007.
In live migration phase 2, it use iterative, bandwidth adapting pre-copy memory page transfer algorithms. Objective: Minimize VM downtime Keep total migration time low Lower the aggregated bandwidth consumption for a migration. Non-neglectable network overhead 500 Mb/s for 10 seconds for a trivial web server VM 7 C. Clark, K. Fraser, S. H, J. G. Hansen, E. Jul, C. Limpach, I. Pratt, and A. Warfield. Live migration of virtual machines. In Proc. of 2nd ACM/USENIX Symp. on Network Systems Design and Implementation, pages 273–286, 2005.
Example: Requiring the execution of 20 VM migrations within 5 minutes. Assume each migration consumes 1 Gb/s for 20 seconds. Sequentially scheduling them over a single 10 (1) Gb/s link saturates the link completely for 40 (400) seconds Outcome: VMs expose sudden network load increases that would possibly lead to resource shortages. 8
In order to deal with the network overhead of live migration, we propose migration scheduling architecture. 9 Data Center Collect performance parameters Classify Workload Type, Predict host utilization Determines expected resource bottlenecks and low utilization levels Handle unexpected situation such as sudden surges in resource demand Decide operational live migration plan to avoid migration-related SLA violations
We identify the following main workload attributes for our classification: Predictability: Predictable means workload behavior can be reliably forecasted for a given period of time. Forecasting errors are tightly bounded. Trend: Refers to the degree of upward or downward leading demand trends. Periodicity: Indicates the length (time scale) and the power of recurring patterns. 10
For example: 1. Predictive, low-variable, low-trend afflicted workloads: Can be co-hosted more aggressively by exploiting workload complementarities. 2. Highly non-predictive Require certain buffer capacity on hosts so as to guarantee overload-avoidance. Note: The implementation of workload classifier is not the scope of this paper. Supervise target for a period of time Make class-assignment decision 11
For predictive workload classes Intuition: Cohosting VMs with complementary workloads High resource utilization can be achieved. Method: During runtime, use live migration to execute VM re- allocation plan to optimize the VM allocation. Objective: Decrease the number of required hosts. High resource utilization can be achieved without overload. 12
For non-predictive workload classes Method: Setting a rather conservative threshold value regarding overall host utilization to avoid overload. If thresholds are exceeded, one or multiple VMs are selected as migration candidates Objective: Avoid overload is first priority. 13
Bandwidth adapting pre-copy memory page transfer algorithms : 1. All main memory pages are transferred 2. Only transferred memory pages that have been written to (dirtied) during the previous iteration. Bandwidth usage is adaptively increased in each iteration 3. If the set of dirtied memory pages is sufficiently small or the upper bandwidth limit is reached then go to step 4. Otherwise go to 2. 4. The last pre-copy iteration is started. Service downtime 14 i q = the duration of the q-th iteration of VM i b i =constant bandwidth adaptive rate of VM i m i = memory size of VM i r i = the constant memory dirtying rate of VM i
Only 2 Migration can be launched simultaneously (D is Rejected) Currently, the bandwidth usage cannot be control during migration. We can only control maximum bandwidth usage level. 15 Deadline: A: t1/t5 B: t1/t6 C: ignored/ignored D: t2/t5
Migration scheduler should exercise the control of migration bandwidth usage. 16 3 Migration can be launched simultaneously Deadline: A: t1/t5 B: t1/t6 C: ignored/ignored D: t2/t5
Assumption 1: A fixed available bandwidth on each link is reserved for VM migrations We allow for different amounts of reservations on different links Offline scheduling can be used for predictive VM workload clusters with periodicity or for clusters with trend. Objective Avoid the risk of overloading network links by migration-related bandwidth consumption 17
Without assumption 1: Objective Minimize the migration-related risk of network congestions with respect to bandwidth demand fluctuations. Since available bandwidth is not known exactly in advance Solution: Predict the average utilization of network links for all time slots (e.g. via the Network Weather Service ) Constantly adjust the bandwidth usable for migrations to meet bandwidth utilization. A more conservative available-bandwidth prediction is advisable 18  R. Wolski. Dynamically forecasting network performance using the network weather service. Journal of Cluster Computing, 1(119- 132), 1998.
Characteristics Undefined sequence. Migrations can be delayed as long as migration- finishing deadlines reached. A migration might be rejected in case it can not be executed 19
Solution: Emergency migrations may temporarily supersede bandwidth allocations of lower priority migrations as Figure 3. The prioritization problem in network revenue management is similar to this issue. 20 Figure 3
In this paper we propose: Network topology aware scheduling models for VM live migrations Taking explicitly bandwidth requirements and the network topology into account A scheme for classifying VM workloads. Future work: In co-operation with a commercial data center operator we are currently implementing the proposed architecture. 21
Good point to consider bandwidth management of Live migration. But no arithmetic model for Migration schedule. The model of prediction is simple and non- practical. How to predict workload is my way to do deep research. 22