© 2010 VMware Inc. All rights reserved Distributed Resource Management for Virtualized System Clusters VMware, Inc.

2 Outline  Distributed Resource Management for Virtualized System Clusters  Infrastructure Live Migration Architecture for Distributed Communication and Control  Resource Management Basics  Distributed Resource Scheduling  Distributed Power Management  Distributed High Availability  Summary

3 Virtualized Systems Cluster Infrastructure: Live Migration  Example: VMware Vmotion “Hot” migrate VM across hosts Transparent to guest OS, apps Minimal downtime (sub-second) Requirements (current) Globally accessible storage (SAN/NAS) Same subnet (no forwarding proxy) Compatible processors Details Bitmap tracks modified pages Pre-copy iteration sends modified pages Repeatedly pre-copy “diff” until converge Exploit meta-data (shared, swapped)

4 Virtualized Systems Cluster Infrastructure: Example Architecture Host+VM Mgmt clients DB cluster n cluster 1 stats + actions SDK UI ResMgr 1 n

5 Outline Distributed Resource Management for Virtualized System Clusters  Infrastructure  Resource Management Basics Goals Controls  Distributed Resource Scheduling  Distributed Power Management  Distributed High Availability  Summary

6 Resource Management Goals  Performance isolation Prevent virtual machines (VMs) from monopolizing resources Guarantee predictable service rates  Efficient utilization Exploit undercommitted resources Overcommit with graceful degradation Exploit opportunities to reduce power consumption  Easy administration Flexible dynamic partitioning Meet absolute service-level agreements Control relative importance of VMs  Respect availability constraints

7 Resource Controls Overview  Useful Features Express absolute service rates [e.g., 512MHz, 1GB] Express relative importance [e.g., VM A to get 2x CPU of VM B] Grouping VMs for isolation, sharing [e.g., VMs A,B to share 1GHz]  Challenges Simple enough for novices Powerful enough for experts Mapping application-level metrics to physical resource consumption E.g., What MHz is needed to guarantee 100 transactions/second? Scaling from single host to cluster of (say) 32/64/128 servers

8 Basic Resource Controls  Shares Specify relative importance Entitlement directly proportional to shares Abstract relative units, only ratios matters  Reservation Minimum guarantee, even when system overcommitted Concrete absolute units (MHz, MB) Admission control: sum of reservations ≤ capacity  Limit Upper bound on consumption, even when undercommitted Concrete absolute units (MHz, MB)

9 Resource Pools  Motivation Allocate aggregate resources for sets of VMs Isolation between pools, sharing within pools Flexible hierarchical organization Access control and delegation  What is a resource pool? Named object with permissions Reservation, limit, and shares for each resource Parent pool, child pools, VMs

10 Resource Controls: Exploration Areas  Additional controls Real-time latency guarantees Application-level metrics Users think in terms of transaction rates, response times Labor-intensive, requires detailed domain/app-specific knowledge Automate mapping to physical resource controls

11 Outline Distributed Resource Management for Virtualized System Clusters  Infrastructure  Resource Management Basics  Distributed Resource Scheduling Overview Example Exploration Areas  Distributed Power Management  Distributed High Availability  Summary

12 Distributed Resource Scheduling Overview  Useful features Choose initial host for VM power on Dynamic rebalancing by migrating running VMs between hosts Configurable automation and migration threshold levels Provide host evacuation for flexible host downtime Support optional constraints on VM colocation on hosts Preserve resources for failover  Challenges Placement and migration decisions involve multiple resources Resource pools can span multiple hosts Determining appropriate migration threshold controls Assorted failures modes (hosts, connectivity, etc.)

13 Distributed Resource Scheduling Example: VMware DRS  Cluster-wide resource management Hierarchical organization and delegation Flexible grouping, sharing, and isolation Configurable automation levels, migration aggressiveness Configurable VM affinity/anti-affinity rules Preserves unfragmented spare resources for failover  Automatic virtual machine placement and migration Choose initial host when VM powers on Optimize load balance across hosts Dynamic rebalancing using VMotion React to dynamic load changes Evacuate hosts for maintenance and/or power-off

14 Example: VMware DRS Balancing Details  Compute VM entitlements Based on resource pool and VM resource settings VM demand includes usage and unsatisfied demand Don’t give VM more than it demands Reallocate extra resources fairly  Compute host loads Load ≠ utilization unless all VMs equally important Sum entitlements for VMs on host Normalize by host capacity  Consider possible migrations Evaluate cluster balance impact and risk-adjusted cost/benefit Incorporate migration cost for involved hosts  Recommend best moves (meeting specified threshold)

15 Confidential VMware DRS Simple Balancing Example (all VMs in same resource pool with same shares)  Recommendation to improve imbalance: migrate VM2 VM2 VM1 4GHz 3GHz2GHz Host normalized entitlement = 1.25 (5/4) VM3 VM4 4GHz 1GHz Host normalized entitlement = 0.5 (2/4)

16 Distributed Resource Scheduling: Exploration Areas  I/O resource management Quality of service for networking, storage End-to-end control difficult, complex switching/routing fabric Lack of standards, even in non-virtualized environments  Proactive migrations Detect longer-term trends Move VMs based on predicted load while minimizing impact on current load  Large-scale WAN/Grid management

17 Outline Distributed Resource Management for Virtualized System Clusters  Infrastructure  Resource Management Basics  Distributed Resource Scheduling  Distributed Power Management Overview Example Exploration Areas  Distributed High Availability  Summary

18 Distributed Power Management Overview  Useful features Migrate VMs to allow hosts to be powered-off when demand low Power on hosts when demand rises or needed to satisfy constraints Work in concert with Distributed Resource Scheduling and Distributed High Availability goals and constraints Configurable automation and utilization threshold levels  Challenges Non-homogeneous hosts can result in utilization hot or cold spots Sudden unpredicted rise in demand can cause performance impact Benefits depend on demand valleys of non-trivial duration Variety of host wake methods with usability pros/cons

19 Distributed Power Management Example: VMware DPM  Implements useful features: Consolidates virtual machines (VMs) onto fewer hosts & powers hosts off when demand is low Powers hosts back on when needed to meet workload demand or to satisfy constraints Optional add-on to VMware Distributed Resource Scheduler (DRS) DRS Cluster with DPM enabled Response to Reduced Demand Power Off

20 Example: VMware DPM Operation in VC2.5/ESX 3.5  Lightly-used Hosts  Consider Host Power-Off Conservative: considers 40-minutes load history All VMs on selected host are migrated to other hosts Weighs trade-offs between costs and benefits of power-off Host is powered off  Heavily-used Hosts  Consider Host Power-On Responsive: considers 5-minute load history Send wake-on-LAN packet or BMC command [2009] to host Host boots up DRS load-balancing kicks in and some VMs migrated to host  Target Host Utilization Range centered around 63% by default

21 Example: VMware DRS/DPM Interactions  Responsibilities DRS balances load to satisfy service-level agreements DPM reduces running cluster capacity to save power Both DRS and DPM respect any resources needed for failover  Interactions DRS rebalances with DPM-recommended power actions in what-if simulations DPM evaluates the impact of potential power actions based on DRS rebalancing results  Recommendations Final host power actions and VMotions Recommended to user, or applied automatically

22 Distributed Power Management: Exploration Areas  Use VM demand prediction to drive proactive host power-on  Incorporate additional metrics in host off/on selection Examples: host power efficiency, temperature  Operate in cooperation with host-level power management Example: Choose hosts for off/on based on power-management features

23 Confidential Outline Distributed Resource Management for Virtualized System Clusters  Infrastructure  Resource Management Basics  Distributed Resource Scheduling  Distributed Power Management  Distributed High Availability Overview Example Exploration Areas  Summary

24 Distributed High Availability Overview  Useful features Provide various methods to specify resources to reserve to restart VMs upon failures of their hosts in a virtualized system cluster Express whether failover resource reservation is strict or best-effort Decentralized host failure detection and quick VM restart Work in concert with DRS and DPM goals and constraints  Challenges Preserving unfragmented failover resources across hosts Avoiding conservative allocation of spare resources when running VMs have widely different resource reservations and needs Support robust failover operation in many possible failure situations Provide cluster failover status information in a user-friendly way

25 Distributed High Availability Example: VMware HA  Specify resources to be set aside for failover Number of host failures to tolerate Percentage of cluster capacity [2009] Specific hosts to set aside for failover [2009]  Detect failover and respond Cluster hosts send each other heartbeats; when a host fails to do so for some period, failover response action is launched For failed hosts, their running VMs are restarted on other hosts Safe restart supported by locking in ESX server storage system  At failover, want unfragmented powered-on resources CPU,Memory resources for each VM to failover available on 1 host Resources must be on powered-on host DRS/DPM/HA proactively maintain appropriate spare resources

26 Distributed High Availability: Exploration Areas  Provide alternative to VM restart on host failure via continuing the VM from a secondary copy already executing on another cluster host [2009]; enabled by VM snapshot and record/replay  Support DRS/DPM aided failover if decentralized restart fails; use migrations and host power-on if needed to get resources  Explore detecting/responding to partial host failure modes; run on host with diminished working capacity

27 Outline Distributed Resource Management for Virtualized System Clusters  Infrastructure  Resource Management Basics  Distributed Resource Scheduling  Distributed Power Management  Distributed High Availability  Summary

28 Summary  Virtualized System clusters with appropriate supporting infrastructure can benefit from distributed resource scheduling, distributed power management, and distributed high availability.  Each of these technologies has a number of areas open for exploration and innovation.

© 2010 VMware Inc. All rights reserved Distributed Resource Management for Virtualized System Clusters VMware, Inc.

Similar presentations

Presentation on theme: "© 2010 VMware Inc. All rights reserved Distributed Resource Management for Virtualized System Clusters VMware, Inc."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

© 2010 VMware Inc. All rights reserved Distributed Resource Management for Virtualized System Clusters VMware, Inc.

Similar presentations

Presentation on theme: "© 2010 VMware Inc. All rights reserved Distributed Resource Management for Virtualized System Clusters VMware, Inc."— Presentation transcript:

Similar presentations

About project

Feedback