VMware vCenter Server High Availability

Slides:

Advertisements

Similar presentations

Symantec 2010 Windows 7 Migration EMEA Results. Methodology Applied Research performed survey 1,360 enterprises worldwide SMBs and enterprises Cross-industry.

Advertisements

Symantec 2010 Windows 7 Migration Global Results.

Software Version: DSS ver up01

Introduction to the WatchGuard AP Device

1 Senn, Information Technology, 3 rd Edition © 2004 Pearson Prentice Hall James A. Senns Information Technology, 3 rd Edition Chapter 7 Enterprise Databases.

Process Description and Control

Zhongxing Telecom Pakistan (Pvt.) Ltd

Virtual Trunk Protocol

Chapter 7 Constructors and Other Tools. Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 7-2 Learning Objectives Constructors Definitions.

Copyright © 2003 Pearson Education, Inc. Slide 7-1 Created by Cheryl M. Hughes The Web Wizards Guide to XML by Cheryl M. Hughes.

David Burdett May 11, 2004 Package Binding for WS CDL.

Local Customization Chapter 2. Local Customization 2-2 Objectives Customization Considerations Types of Data Elements Location for Locally Defined Data.

Create an Application Title 1Y - Youth Chapter 5.

1 Chapter 12 File Management Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,

1 Advanced Tools for Account Searches and Portfolios Dawn Gamache Cindy Bylander.

© Tally Solutions Pvt. Ltd. All Rights Reserved Shoper 9 License Management December 09.

Welcome. © 2008 ADP, Inc. 2 Overview A Look at the Web Site Question and Answer Session Agenda.

© SafeNet Confidential and Proprietary Administering SafeNet StorageSecure Smart Card Module 3: Lesson 5 SafeNet StorageSecure Storage Security Course.

Break Time Remaining 10:00.

Database Performance Tuning and Query Optimization

Chapter 1: Introduction to Scaling Networks

PP Test Review Sections 6-1 to 6-6

1 IMDS Tutorial Integrated Microarray Database System.

WebCafé Slide No:1 World Cyber Cafe Association Brings to You Webcafe A Cyber Café Management Software A Software That Will Boost Your Efficiency For Managing.

INTRODUCTION Lesson 1 – Microsoft Word Word Basics

Office 2003 Introductory Concepts and Techniques M i c r o s o f t Office 2003 Integration Integrating Office 2003 Applications and the World Wide Web.

Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.

 Copyright I/O International, 2013 Visit us at: A Feature Within from Item Class User Friendly Maintenance  Copyright.

MaK_Full ahead loaded 1 Alarm Page Directory (F11)

GEtServices Services Training For Suppliers Requests/Proposals.

High Availability Deep Dive What’s New in vSphere 5 David Lane, Virtualization Engineer High Point Solutions.

Chapter 9: Subnetting IP Networks

Types of selection structures

Chapter 12 Working with Forms Principles of Web Design, 4 th Edition.

© Ericsson Interception Management Systems, 2000 CELLNET Drop Administering IMS Database Module Objectives To add a network elements to the database.

1 © 2004, Cisco Systems, Inc. All rights reserved. CCNA 1 v3.1 Module 9 TCP/IP Protocol Suite and IP Addressing.

Chapter 11 Creating Framed Layouts Principles of Web Design, 4 th Edition.

1 © 2006 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Using the Cisco Technical Support & Documentation Website for Online.

© Paradigm Publishing, Inc Excel 2013 Level 2 Unit 2Managing and Integrating Data and the Excel Environment Chapter 6Protecting and Sharing Workbooks.

© 2011 VMware Inc. All rights reserved High Availability Module 7.

MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 4 Installing and Configuring the Dynamic Host Configuration Protocol.

Lesson 1: Configuring Network Load Balancing

5.1 © 2004 Pearson Education, Inc. Exam Managing and Maintaining a Microsoft® Windows® Server 2003 Environment Lesson 5: Working with File Systems.

Introducing VMware vSphere 5.0

Virtualization Infrastructure Administration Cluster Jakub Yaghob.

© 2010 VMware Inc. All rights reserved VMware ESX and ESXi Module 3.

High Availability Module 12.

VMware vCenter Server Module 4.

Scalability Module 6.

Hands-On Microsoft Windows Server 2008

Guided Consolidation Product Support Engineering VMware Confidential.

INSTALLING MICROSOFT EXCHANGE SERVER 2003 CLUSTERS AND FRONT-END AND BACK ‑ END SERVERS Chapter 4.

Chapter 8 Implementing Disaster Recovery and High Availability Hands-On Virtual Computing.

Cisco Confidential © 2010 Cisco and/or its affiliates. All rights reserved. 1 MSE Virtual Appliance Presenter Name: Patrick Nicholson.

Chapter 10 Chapter 10: Managing the Distributed File System, Disk Quotas, and Software Installation.

11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.

VApp Product Support Engineering Rev E VMware Confidential.

VMware vSphere Configuration and Management v6

Virtual Infrastructure Web Access Product Support Engineering VMware Confidential.

You there? Yes Network Health Monitoring Heartbeats are sent to monitor health status of network interfaces Are sent over all cluster.

DPM - IPMI Product Support Engineering VMware Confidential.

Virtual Machine Movement and Hyper-V Replica

VMware Certified Professional 6-Data Center Virtualization Beta 2V0-621Exam.

vSphere HA and vSphere Fault Tolerance

VMware ESX and ESXi Module 3.

vSphere 6 Foundations Beta Question Answer

VSPHERE 6 FOUNDATIONS BETA Study Guide QUESTION ANSWER

Presentation transcript:

VMware vCenter Server High Availability Product Support Engineering VMware Confidential

Module 2 Lessons Lesson 1 – vCenter Server High Availability Lesson 2 – Distributed Resource Scheduler Lesson 3 – Fault Tolerance Virtual Machines Lesson 4 – Enhanced vMotion Compatibility Lesson 5 – DPM - IPMI Lesson 6 – vApps Lesson 7 – Host Profiles Lesson 8 – Reliability, Availability, Serviceability ( RAS ) Lesson 9 – Web Access Lesson 10 – vCenter Update Manager Lesson 11 – Guided Consolidation Lesson 12 – Health Status Agenda Overview VI4 - Mod 2-1 - Slide

Module 2-1 Lessons Lesson 1 – Overview of High Availability Lesson 2 – VMware HA Clusters Lesson 3 – Creating HA Clusters Lesson 4 – Monitoring HA Clusters Lesson 5 – HA Clusters Best Practices Lesson 6 – Troubleshooting VMware HA Lesson 7 – Customizing VMware HA Agenda Overview VI4 - Mod 2-1 - Slide

VMware High Availability This section provides information about using solutions that provide business continuity, including how to establish VMware High Availability (HA) and VMware Fault Tolerance. VMware Infrastructure provides multiple ways to ensure that the information and services in your business infrastructure are consistently available. VMware offerings can be placed into the following categories: High availability through reduced (planned and unplanned) downtime. Solutions for this include Fault Tolerance and VMware High Availability (HA). Data protection including non disruptive backup and restore processes. A solution for this is VMware Consolidated Backup (VCB). Disaster recovery with hardware independent recovery. A solution for this is Site Recovery Manager (SRM). VI4 - Mod 2-1 - Slide

High Availability Solutions Unplanned downtime VMware Infrastructure builds fault tolerance capabilities into datacenter infrastructure. These features can be easily configured, thus reducing the cost and complexity of providing higher availability. Key fault-tolerance capabilities built into VMware Infrastructure include: Network interface (NIC) teaming to provide tolerance of individual network card failures Storage multipathing to tolerate storage path failures VI4 - Mod 2-1 - Slide

High Availability Solutions VMware High Availability and VMware Fault Tolerance, implemented through VMware Infrastructure, offer simple, cost effective solutions that help mitigate situations that could otherwise make data or services unavailable to users. VMware HA - Checks that ESX/ESXi hosts are functioning. If an ESX/ESXi host fails, another ESX/ESXi host restarts any virtual machines that were running on the server that failed. VMware Fault Tolerance (FT) - Checks that individual virtual machines are functioning and deals with failures without any interruption in service. VMware FT creates hidden duplicate copies of running virtual machines so if a virtual machine fails due to hardware or software failures, the duplicate virtual machine can immediately replace the one that was lost. VI4 - Mod 2-1 - Slide

High Availability Solutions High availability and fault tolerance are different from other business continuity offerings in that the solution: Exists within a single datacenter. Other solutions exist across physical locations. Uses shared storage for holding the machines' data. Other solutions use multiple copies of the data, which are regularly replicated. Fault tolerance addresses a number of common problematic situations Fault Tolerance will be discussed in Module 2-3. VI4 - Mod 2-1 - Slide

Providing High Availability with VMware HA When using VMware HA, a set of ESX/ESXi hosts is combined into a cluster with a shared pool of resources. vCenter Server monitors all hosts in the cluster. If one of the hosts fails, each associated virtual machine is promptly restarted on a different host. Using VMware HA has a number of advantages: Minimal setup — The New Cluster Wizard is used for initial setup. Reduced hardware cost and setup — The virtual machine acts as a portable container for the applications and the virtual machine can be moved among hosts. Increased application availability — Any application running inside a virtual machine has access to increased availability. Full Distributed Resource Scheduler (DRS) integration — If a host has failed and virtual machines have been restarted on other hosts DRS can provide migration recommendations or migrate virtual machines for balanced resource allocation VI4 - Mod 2-1 - Slide

Virtual Machine Resource Requirements and VMware HA When establishing a VMware HA cluster, decide how to make the best use of resources available on hosts while providing a high level of availability for virtual machines. Consider the following factors: Each host has some amount of memory and CPU that it can make available for use by virtual machines. Each virtual machine must be guaranteed its CPU and memory reservation requirements. VI4 - Mod 2-1 - Slide

Understanding the Resource Allocation Tab in Clusters If the host being used to start a virtual machines is in a cluster, you can view information about reserved resources on the Resource Allocation tab for that cluster. The information for the CPU and Memory reservations indicates that reservations have been made Summary reservation information displays information about reservations on the cluster root, where all reservations occurred. Individual virtual machines do not actually have any reservations. VI4 - Mod 2-1 - Slide

VMware HA Cluster Prerequisites This section describes the prerequisites for establishing VMware HA clusters. A number of conditions must be established for VMware HA to be used. All virtual machines and their configuration files must reside on shared storage (such as a SAN) Hosts must also be configured to have access to the same virtual machine network. Each host in a VMware HA cluster must have a host name assigned and a static IP address. VMware recommends redundant Service Console and VMkernel networking NOTE After you have added a NIC to a host in your VMware HA cluster, you must reconfigure VMware HA on that host. There has been no change from ESX 3-4 on the requirements for HA VI4 - Mod 2-1 - Slide

A note on VMware HA ‘slot’ Calculation Slot calculation is still done by the vCenter HA service. It gives the HA service the capacity of the cluster as a whole For Virtual Center 2.x the VM with maximum resource consumption was the one chosen as the basis of the slot calculation. This poised a problem if there was only one heavily resourced Virtual Machine and the other VM’s did not use so much resources. You would get an unfair calculation of remaining resources. This has been changed for vCenter 4. The slot size is shown in the UI. VI4 - Mod 2-1 - Slide

A note on VMware HA ‘slot’ Calculation When you use the Host failures cluster tolerates option, it is most effective if all virtual machines have a similar CPU and memory requirement. If you have highly variable configurations, consider using the Percentage of cluster resources reserved as failover spare capacity option. When tolerating a specific number of host failures, VMware HA plans for a worst-case scenario by considering all powered-on virtual machines in a cluster and finding the maximum memory and CPU reservations. These maximums are the basis for what is called a slot, which is a logical representation of the largest virtual machine in the cluster. If no reservations are set on a virtual machine, default requirements of 256MB and 256MHz are assigned. VI4 - Mod 2-1 - Slide

A note on VMware HA ‘slot’ Calculation VMware HA determines how many slots are available in each ESX/ESXi host based on the host’s CPU and memory capacity. VMware HA then determines how many ESX/ESXi hosts could fail with the cluster still having at least as many slots as powered on virtual machines. When you use the Percentage of cluster resources reserved as failover spare capacity option, each time a request is made to power on a virtual machine, admission control determines the amount of resources the virtual machine needs and how much uncommitted resources remain on cluster resources for failovers. If sufficient resources are available, the virtual machine is powered on. This process does not guarantee maintaining a level of service if a number of hosts fail, but it is a more flexible and less conservative approach to assessing whether or not to power on machines. This policy does not use slots. It uses the actual reservations of the virtual machines. If a virtual machine does not have reservations, meaning that the reservation is 0, a default of 256MB and 256MHz is applied. This is controlled by the same HA advanced options used for the failover level policy. VI4 - Mod 2-1 - Slide

Monitoring Availability You can monitor changes in your high-availability deployment using events and alarms Use the functionality included in Alarms and Actions to determine what actions are taken when VMware HA events occur. VI4 - Mod 2-1 - Slide

Creating a VMware HA Cluster Clusters enable a collection of ESX/ESXi hosts to work together. This provides higher levels of availability for virtual machines You can create a new cluster using the Cluster Creation Wizard Begin: Start by right clicking and adding a new cluster (note there are clusters already created) Click: You name the cluster and check the checkbox for HA Click: You then select your addmission control. - You’ll note at the top the Host monitoring checkbox that allows you to do network maintenance - You’ll note at the bottom there is a Percentage of cluster resources that can be reserved for failover (more on that later). Click: Restart priortiy and isolation mode (nothing new) Click: VM monitoring or individual VM restarts are now supported in vCenter. The VM is restarted if the heartbeat is lost. Click: Swap space confirmation (nothing new) Click: click finish and HA will setup VI4 - Mod 2-1 - Slide

Creating a New VMware HA Cluster Some key settings include: Admission Control specifies the kind of policy to enforce to ensure that there are enough resources to perform a failover in the case of a host failure. If enabled, admission control prevents certain operations on virtual machines (such as powering them on) if doing so would violate the policy. Monitoring Sensitivity, determines the duration VMware HA waits between heartbeats before restarting virtual machines that were hosted on the ESX/ESXi host from which a heartbeat has not been received. Default Cluster Settings determine how quickly virtual machines are restarted and what action to take if a host becomes isolated. Nothing new VI4 - Mod 2-1 - Slide

Cluster Features The first panel in the New Cluster wizard allows you to specify basic options for the cluster, including: Name — Specifies the name of the cluster. This name appears in the VI Client inventory panel. You must specify a name to continue with cluster creation. Turn On VMware HA — If this box is selected, virtual machines on are restarted on different host in a cluster if one of the cluster's hosts fails. Turn On VMware DRS — If this box is selected, DRS uses load distribution information for initial placement and load balancing recommendations or to place and migrate virtual machines automatically. In this panel you can specify the name and choose one or both cluster features. You can change any of these cluster features at a later time. VI4 - Mod 2-1 - Slide

Host Monitoring Status and Admission Control Configure host monitoring and Virtual Machine admission control. High Availability services check if ESX/ESXi hosts and virtual machines are available VI4 - Mod 2-1 - Slide

Admission Control Admission control helps ensure that sufficient resources remain to provide high availability, even after some number of concurrent host failures. Some resources in a cluster are reserved until they are needed for recovery from failure. To prevent too many virtual machines from starting, which would leave too few resources available for failover, VMware HA uses admission control. Nothing new, what is more important Uptime or resource fairness. This is the only area where the VMkernel will ignore reservations if you select the bottom option. More on next slide. VI4 - Mod 2-1 - Slide

Admission Control You can enable or disable admission control by selecting the following options: Prevent VMs from being powered on if they violate availability constraints — Enforcing availability constraints preserves failover capacity. If this option is selected (the default), the following operations are also not allowed if they would violate admission control: Reverting a powered off virtual machine to a powered on snapshot Migrating a virtual machine into the cluster Reconfiguring a virtual machine to increase its CPU or memory reservation Allow VMs to be powered on even if they violate availability constraints VI4 - Mod 2-1 - Slide

Admission Control Policy VMware HA provides options for what policy is enforced if admission control is enabled. Host failures cluster tolerates – VMware HA reserves a certain amount of resources across a set of hosts. These reserved resources are sufficient to sustain performance even if the specified number of hosts fail. Percentage of cluster resources reserved as failover spare capacity – VMware HA reserves a certain percentage of aggregate resources in the cluster to accommodate failures. Specify a failover host – VMware HA reserves a specific host to accommodate failures. This is a more static solution, where a single host is designated as the host that will be the target for virtual machines if one of the other hosts fails. Because this setup can only tolorate one host failure you cannot specify a specific host. You can enter an advanced option to specify what host to failover to (later in the module). VI4 - Mod 2-1 - Slide

Tolerate Some Number of Host Failures You can configure VMware HA to tolerate a specified number of host failures. When using the “Host failures cluster tolerates” option, it is most effective if all virtual machines have a similar CPU and memory requirement. If you have highly variable configurations, consider using the “Percentage of cluster resources reserved as failover spare capacity” option Each host has some amount of memory and CPU that it can make available for use by virtual machines. Each virtual machine must be guaranteed its CPU and memory reservation requirements. VI4 - Mod 2-1 - Slide

Tolerate Some Number of Host Failures Does HA restart VM’s on multiple hosts after a failure or does it function the same as 3.x where the host with the most unreserved capacity is chosen to restart the VM’s on and DRS will load balance those VM’s. -Pending PSE response. - Assume that it functions the same as ESX 3.x and that this picture is inaccurate on the functionality. Examples VI4 - Mod 2-1 - Slide

Tolerate Some Number of Host Failures Examples VI4 - Mod 2-1 - Slide

Reserve a Percentage of Cluster Resources You can configure VMware HA to reserve a specific percentage of cluster resources for recovery from host failures When using the “Percentage of cluster resources reserved as failover spare capacity” option, Each time a request is made to power on a virtual machine, admission control determines the amount of resources the virtual machine would need and how much uncommitted resources remain on cluster resources for failovers This policy does not use slots, but rather it uses the actual reservations of the virtual machines. VI4 - Mod 2-1 - Slide

Specify a Failover Host You can configure VMware HA to reserve a specific host as failover capacity. When using the Specify a failover host option, If one host fails, first attempts are made to restart the virtual machines on the reserved host, but if this is not possible for some reason such as insufficient resources or that the reserved host has failed, attempts are made to restart virtual machines on any available host in the cluster This option does not guarantee a level of availability. It establishes a spare host to use in case of failover If a failover host is specified, HA admission control prevents users from powering on a virtual machine on the failover host or VMotioning virtual machines to the failover host VI4 - Mod 2-1 - Slide

Default Cluster Settings and Virtual Machine Overrides Default virtual machine settings controls The order in which virtual machines are restarted How VMware HA responds if hosts lose network connectivity with other hosts Default settings apply to all virtual machines in this cluster in the case of a host failure or isolation, though exceptions can be configured for each virtual machine VI4 - Mod 2-1 - Slide

VM Restart Priority VM restart priority determines the relative order in which virtual machines are restarted after a host failure. Assign higher restart priority to the virtual machines that host the most important services. For example, in the case of a multi-tier application you might opt to rank assignments according to functions hosted on the virtual machines: High: Database servers that will provide data for applications. Medium: Application servers that consume data in the database and provide results on web pages. Low: Web servers that receive user requests, pass queries to application servers, and return results to users. This is done alphabetically still VI4 - Mod 2-1 - Slide

Host Isolation Response Determines what happens when a host in a VMware HA cluster loses its service console networks (or Vmkernel networks, in ESXi) connection but continues running. Values are: Leave VM powered on (the default), Power off VM, and Shut down VM. When a host in a HA cluster loses its console network (or VMkernel network, in ESXi) connectivity, the host is isolated from other hosts in the cluster. Virtual Machine Settings You can override the default settings established for the cluster. For each virtual machine, you can establish individual settings for Restart Priority and Isolation Response. VI4 - Mod 2-1 - Slide

Virtual Machine Monitoring Sensitivity The degree to which VMware HA is sensitive to virtual machine failures can be configured to different levels. If you select Enable VM Monitoring, VMware Tools will evaluate whether each virtual machine in the cluster is running by checking for regular heartbeats from the GOS. In such a case, the VM monitoring service determines that the virtual machine has failed and the virtual machine is rebooted to restore service Click on the Custom box to configure advanced features for Monitoring Sensitivity VI4 - Mod 2-1 - Slide

Best Practices for Configuring VMware HA Clusters Networking Best Practices If your switches support the PortFast (or an equivalent) setting, enable it on the physical network switches that connect servers. This helps to prevent a host from incorrectly determining that a network is isolated during the execution of lengthy spanning-tree algorithms On ESX hosts, HA automatically opens the firewall ports that are needed for it to function. The following ports are opened: Incoming port: TCP/UDP 8042-8045 Outgoing port: TCP/UDP 2050-2250 Why do we disable portfast on the physical switch that are connected to the vSwitch? a: vswitches do not connect to other vSwitches via ISL. VI4 - Mod 2-1 - Slide

Best Practices for Configuring VMware HA Clusters For better heartbeat reliability, configure end-to-end dual network paths between servers for service console or VMKernel networking. Disable VMware HA (clear the Enable VMware HA checkbox in the Settings dialog box for the cluster) when you perform any networking maintenance that might disable all heartbeat paths between hosts. While DNS is not required, you may choose to use DNS for name resolution rather than the error-prone method of manually editing the local /etc/hosts file on ESX/ESXi hosts. If you do edit /etc/hosts, you must include both long and short names. VI4 - Mod 2-1 - Slide

Best Practices for Configuring VMware HA Clusters Use consistent port names on VLANs for public networks. Port group names are used to reconfigure access to the network by virtual machines. If you use inconsistent names between the original server and the failover server, virtual machines are disconnected from their networks after failover. Use valid virtual machine network labels on all servers in a VMware HA cluster. Virtual machines use these labels to reestablish network connectivity upon restart VI4 - Mod 2-1 - Slide

Best Practices for Configuring VMware HA Clusters Selection of Networks The networks that HA will use by defaults is: ESX: all Service Console Networks ESXi: All VMKernel networks, *except* the VMotion network, unless there is only one network and it is a VMotion network By default, the network isolation address is the default gateway , so it is a best practice to add a das.isolationaddress[...] for each network HA to select the default networks you can use the advanced option das.allowNetwork[...] and HA will use only networks whose port group names match. ESXi by default uses all VMKernel networks, except the VMotion Network unless there is only one network defined. Use das.AllowVmotionNetworks to override this default behavior. Also, you can use das.allowNetwork[...] to specify the networks that will be used for HA. VI4 - Mod 2-1 - Slide

Clusters with both ESX and ESXi hosts In mixed ESX and ESXi clusters, using the das.allowNetwork[...] advanced options may be necessary to ensure compatible networks are selected for hosts. HA configuration enforces that all hosts in the cluster have compatible networks. The first node added to the cluster dictates the networks that all subsequent hosts must have for them to be allowed into the cluster Networks are deemed compatible if the IP address and subnet mask combine to result in a network that matches another host's Use das.allowNetwork[...] advanced options to control which networks are to be used to ensure compatibility between all hosts in the cluster VI4 - Mod 2-1 - Slide

Setting Up Networking Redundancy Networking redundancy between cluster nodes is important for VMware HA reliability. Redundant service console networking on ESX4 (or VMkernel networking on ESXi) allows the reliable detection of failures and prevents isolation conditions from occurring. NIC Teaming Using a team of two NICs connected to separate physical switches improves the reliability of a service console (or, in ESXi, VMkernel) network. To configure a NIC team for the service console, configure the vNICs in vSwitch configuration for Active or Standby configuration. The recommended parameter settings for the vNICs are: Default load balancing = route based on originating port ID Failback = No VI4 - Mod 2-1 - Slide

Secondary Service Console Network You can create a secondary service console (or VMkernel port for ESXi), which is attached to a separate virtual switch The primary service console is used for network and management purposes. With a secondary service console network created, VMware HA sends heartbeats over both the primary and secondary service consoles. When you set up service console redundancy, you must specify an additional isolation response address (das.isolationaddress2) for the service console networks When you specify a secondary isolation address, you should increase the das.failuredetectiontime setting to 20000 milliseconds or greater Adding a secondary service console network to the VMotion vswitch. A virtual switch can be shared between VMotion networks and a secondary service console network. VI4 - Mod 2-1 - Slide

Other VMware HA Cluster Considerations Use larger groups of homogeneous servers to allow higher levels of utilization across an VMware HA-enabled cluster (on average). More nodes per cluster can tolerate multiple host failures while still guaranteeing failover capacities. The failover level policy used in admission control heuristics is conservatively weighted, so that virtual machines on large servers can fail over to smaller servers. http://www.answers.com/heuristics ;-) VI4 - Mod 2-1 - Slide

Modifying Cluster Settings You can modify cluster settings using the Edit Cluster Settings dialog box To complete this task, you must use an account with cluster administrator privileges. Right-click the cluster in the VI Client inventory panel and click Edit Settings. In the left pane of the cluster Settings dialog box, click the set of options you want to modify. VI4 - Mod 2-1 - Slide

Viewing Information about VMware HA Clusters You can view current settings for a cluster The cluster Summary page displays summary information for the cluster. Note this screenshot is outdates: There is an Advanced Link in the VMware HA portion that will show you the slot calculation. VI4 - Mod 2-1 - Slide

How VMware HA Works VMware HA designates hosts for particular roles and responds to problems as they arise. In a VMware HA deployment, a set of ESX/ESXi hosts is combined into a cluster with a shared pool of resources. Hosts in the cluster are monitored and if one of the hosts fails, virtual machines that were running on the failed host are restarted on a different host. VMware HA can be used in conjunction with VMware Fault Tolerance to provide preservation of run-time state and even shorter application downtime. VI4 - Mod 2-1 - Slide

Using VMware HA with Admission Control Admission control helps ensure that VMware HA performs as expected. When you enable a cluster for VMware HA, you can specify how much host capacity to reserve so that the cluster through one of the following strategies: Is prepared to tolerate some number of host failures. Reserves some percentage of total capacity. Specifies a particular host exclusively for failover purposes. If a host fails, VMware HA attempts to restart virtual machines on one of the remaining hosts. By default, you cannot power on a virtual machine if doing so violates the admission control policy. This is the same functionality of ESX 3 VI4 - Mod 2-1 - Slide

Primary and Secondary Hosts Some hosts in a VMware HA cluster are designated as primary hosts. They maintain information about the cluster such as membership. The first five hosts in the cluster are designated primary hosts, and all subsequent hosts are designated secondary hosts. When you add a host to a VMware HA cluster, that host communicates with an existing primary host in the same cluster to complete its configuration When a primary host becomes unavailable or is removed from the cluster VMware HA promotes one of the secondary hosts to primary status. Primary hosts help provide redundancy by replicating the cluster's configuration information and virtual machine states and are used to initiate failover actions VI4 - Mod 2-1 - Slide

VMware HA Clusters and Maintenance Mode Put a host in maintenance mode in preparation for completing administrative tasks that would otherwise cause unwanted HA responses. Putting a host into maintenance mode effectively disables the HA service. You cannot power on a virtual machine on a host that is in maintenance mode. VMware HA does not fail over any virtual machines to a host that is in maintenance mode When a host exits maintenance mode, the VMware HA service is reenabled on that host, so it becomes available for failover again If the host is in a cluster, when it enters maintenance mode the user is given the option to evacuate powered-off virtual machines VI4 - Mod 2-1 - Slide

VMware HA Clusters and Disconnected Hosts Users may initiate state changes, such as during network maintenance. ESX/ESXi host in a cluster may no longer be able to communicate with other hosts in a cluster That host becomes disconnected The unresponsive host continues to function, but its state is unknown When a host is disconnected, VMware HA cannot use it as a guaranteed failover target. VMware HA does not consider disconnected hosts when making calculations related to admission control. When the host becomes reconnected, the host becomes available for failover again VI4 - Mod 2-1 - Slide

VMware HA Clusters and Disconnected Hosts The difference between a disconnected host and a host that is not responding is that: A disconnected host has been explicitly disconnected by the user. As part of disconnecting a host, VMware HA is disabled on that host. The virtual machines on that host are not failed over and not considered when the current failover level is computed. If a host is not responding, no other hosts receive heartbeats from it. This might happen, for example, because of a network problem or because the host failed. Disconnected and unresponsive hosts are not included in computations of the current failover level, but any virtual machines running on an unresponsive host will be failed over if the host fails. VI4 - Mod 2-1 - Slide

VMware HA Clusters and Host Network Isolation Network isolation occurs when a host is still running, but it can no longer communicate with other hosts in the cluster. Hosts are determined to have failed after the time limit between heartbeats has elapsed and no heartbeat has been received. The default heartbeat monitoring sensitivity detects failures after 15 seconds have elapsed. The host declares itself as isolated from the network after it has lost network connectivity for more than 12 seconds and is unable to ping its isolation address or addresses (default gateway). If the isolated host has SAN access, it retains the disk lock on the virtual machine files, and any attempts to fail over the virtual machine to another host would fail. The virtual machine continues to run on the isolated host VMFS disk locking prevents simultaneous write operations to the virtual machine disk files and potential corruption Same as ESX 3.x VI4 - Mod 2-1 - Slide

VMware HA Clusters and Host Network Isolation If the network connection is restored before 12 seconds have elapsed, other hosts in the cluster do not treat this as a host failure, but rather as a transient issue that has been resolved. If the network connection is not restored for 15 seconds or longer, the other hosts in the cluster treat the host as failed and attempt to fail over the virtual machines on that host. Same as 3.x VI4 - Mod 2-1 - Slide

Setting Up Virtual Machine-Level Overrides You can configure overrides for individual virtual machines so that their behavior differs from cluster defaults. Virtual machine-level overrides are useful for machines that are used for special tasks. For example, Virtual machines that are especially important, such as those that provide infrastructure services like DNS or DHCP, may need to be powered on before other virtual machines in the cluster. Set the priority properly based on the function of the VM. Same as ESX 3.x VI4 - Mod 2-1 - Slide

Monitoring Individual Virtual Machines You can specify behavior for individual virtual machines for: VM Restart Priority — Indicates relative priority for restarting the virtual machine in case of host failure. Host Isolation Response — Specifies what the ESX/ESXi host that has lost connection with its cluster should do with running virtual machines. Monitoring Sensitivity — Specifies how quickly failures are detected. Settings can be changes so certain virtual machines are more or less aggressively monitored. Specific custom values can also be set using advanced options. When you add a host to a cluster, all virtual machines in the cluster default to the cluster’s default VM restart priority (Medium, if unspecified), default host isolation response (Leave Powered Off, if unspecified), and default monitoring sensitivity (High, if unspecified). VI4 - Mod 2-1 - Slide

Customize VMware HA Behavior for Individual VMs You can select specific behavior for each virtual machine. Edit the Cluster Settings Choose VM Monitoring under VMware HA. For each virtual machine in the Virtual Machine Settings pane, select a VM Monitoring setting to customize its settings. Choose VM Options under VMware HA. For each virtual machine, select from the VM Restart Priority or Host Isolation Response menu to customize its settings. Same as ESX 3.x VI4 - Mod 2-1 - Slide

Cluster Validity Cluster validity is based on whether or not there are sufficient resources for hosts to continue to provide the level of service expected for virtual machines. You can review the validity of a cluster using the VI Client, You can review the validity of a cluster using the VI Client, which indicates whether a cluster is valid (green) or invalid (red). A valid cluster is one in which the admission control policy is not violated. A cluster is valid unless something happens that makes it over-committed or invalid. If a cluster becomes invalid, a message is displayed in the Summary page indicating the issue. Some of the ways cluster can become invalid include: A host failure prevented the required level of spare capacity to be preserved All the primary hosts in the cluster are not responding. Same as ESX 3.x VI4 - Mod 2-1 - Slide

Cluster Validity A cluster enabled for VMware HA becomes red when the number of virtual machines powered on exceeds the failover requirements The current failover capacity is less than configured failover capacity If admission control is disabled, clusters do not become red, regardless of whether the hosts can guarantee failover. Inadequate failover capacity can happen if VMware HA is set up for two-host failure in a four-host cluster and one host fails. The remaining three hosts may no longer be able to satisfy a two-host failure. Note that DRS behavior is not affected if a cluster is red because of a VMware HA issue. Same as ESX 3.x VI4 - Mod 2-1 - Slide

Using VMware HA and DRS Together VMware HA will perform more efficiently when used in conjunction with DRS. Using VMware HA and DRS together combines automatic failover with load balancing. In a cluster using DRS and VMware HA with admission control turned on, virtual machines might not be evacuated from hosts entering maintenance mode. This is because of the resources reserved to maintain the failover level. You must manually migrate the virtual machines off of the hosts using VMotion Same as ESX 3.x VI4 - Mod 2-1 - Slide

Troubleshooting VMware HA If no hosts in a cluster are responding, when you attempt to add a new host, VMware HA configuration fails because the new host cannot communicate with any of the primary hosts. Disconnect all hosts that are not responding before adding the new host. After disconnecting all other hosts and adding a new host, that host becomes the first primary host. When other hosts become available again, their VMware HA service is reconfigured and they then become primary or secondary hosts depending on the existing number of primary hosts. VI4 - Mod 2-1 - Slide

Customizing VMware HA After you have established a cluster, you may need to modify settings. There are specific attributes that affect how VMware HA behaves Note the das.defaultfailoverhost can be used in the advanced options if the display is greyed out like we seen in the earlier slide VI4 - Mod 2-1 - Slide

Customizing VMware HA VI4 - Mod 2-1 - Slide Screenshot coming up showing where to enter these in the UI. VI4 - Mod 2-1 - Slide

Customizing VMware HA VI4 - Mod 2-1 - Slide

Set Advanced VMware HA Options To precisely customize VMware HA behavior, set advanced VMware HA options. Prerequisites You must have a VMware HA cluster for which to modify settings. To modify advanced VMware HA settings, you must have cluster administrator privileges. In the cluster’s Settings dialog box, select VMware HA. Click the Advanced Options button to open the dialog box. Enter each advanced attribute you want to change in a text box Click OK. BEGIN: You have to click the advanced options in order to enter these values CLICK: this is where you enter your advanced HA values. VI4 - Mod 2-1 - Slide

Lesson 2-1 Summary Learn how to Create a HA Cluster Learn how to Monitor a HA Cluster Learn how to modify HA Cluster Settings Learn how to troubleshoot HA Clusters VI4 - Mod 2-1 - Slide

Lab – VMware High Availability Lab 1 Part 1 - Creating a vCenter High Availability (HA) Cluster Lab 1 Part 2 – Adding Hosts to High Availability (HA) Cluster Lab 1 Part 3 – Viewing High Availability (HA) Cluster Settings Lab 1 Part 4 – Modifying High Availability (HA) Cluster Settings VI4 - Mod 2-1 - Slide