Cluster Extension for XP and EVA

Cluster Extension for XP and EVA
2007 Dankwart Medger – Trident Consulting S.L.

CLX/Cluster overview

Disaster Tolerant Design Considerations
Wide variety of interconnect options Regional or wide-area protection Support local to global Disaster Tolerant solutions Protection level (Distance) Data Currency (Recovery Point Object) Synchronous or asynchronous options available Data consistency is always assured Manual failover to secondary site Fully automated failover with geographically dispersed clusters on HP-UX, Solaris, AIX, Linux, Windows Failover time (Recovery Time Object) Asynchronous Continuous Access provides minimum latency across extended distances Performance depends on bandwidth to remote data center Performance requirements 26-Oct-06

Server Clustering Purpose Limits
App A App B Cluster Purpose Protect against failures at the host level Server failure Some infrastructure failures Automated failover incl. necessary arbitration Local distances Limits Does not protect against Site disaster Storage failure Core infrastructure failure A major disaster can mean full restore from tape tapes should therefore be stored off-site 26-Oct-06

Array based replication
Storage Replication Purpose Copy of your data in a remote site In case of a major disaster on the primary site no tape restore necessary data still available on remote site operation can be resumed on remote site Long distances through FC Extension technologies and async replication technologies Limits Human intervention to resume operation on remote site Standby system difficult to maintain App A App B Array based replication WAN Cluster 26-Oct-06

The solution Cluster Extension/Metrocluster combines with to build a failover cluster spanning two data centers. Benefits Fully automated application failover even in case of site or storage failure No manual intervention No server reboots, no presentation changes, no SAN changes Intelligent failover decision based on status checking and user settings No simple failover script Integrated into standard OS cluster solution No change to how you manage your cluster today Host IO limited to local array Reducing intersite traffic, enabling long distance, low bandwidth setups the automated failover capabilities of a standard server cluster the remote replication capabilities of the EVA and XP 26-Oct-06

Cluster Extension – the goal
Slide is animated Cluster Extension – the goal Arbitrator Node* App A App A App B Automated by CLX App A App A Continuous Access EVA/XP App B *type of arbitrator depends on cluster 26-Oct-06

Automated failover solutions availability for all major platforms
HP – XP HP – EVA HP-UX MC/SG Metrocluster sync & async, journaling CA sync CA (future: async) Windows MSCS Cluster Extension Solaris VCS future AIX HACMP Linux VmWare 26-Oct-06

Cluster extension for Windows
Cluster Extension XP Cluster Extension EVA Array support XP48/128/512/1024/10000/12000 EVA 3000/4000/5000/6000/8000 OS support Windows 2000 Advanced Server and Data Center Edition Windows 2003 Server Standard/Enterprise (32/64 bit) and Datacenter Edition (64bit) Windows 2003 Server Standard/Enterprise x64 Edition Cluster Support MS Cluster (MS certified as geographically dispersed cluster solution) Replication technology Continuous Access XP (synchronous, asynchronous, journaling) Continuous Access EVA (synchronous, asynchronous planned) Distance, inter-site technology No CLX specific limits; must stay within cluster and replication limitations 500 km with not more than 20 ms roundtrip latency Arbitration CLX Quorum Filter Service (Windows 2000/2003) or MS Majority Node Set (Windows 2003) (including file share witness) MS Majority Node Set (including file share witness) Licensing Licensed per cluster node 26-Oct-06

Cluster integration example: CLX for Windows
File Share All Physical Disk resources of one Resource Group depend on a CLX resource Very smooth integration Physical Disk Network Name CLX IP Address Example taken from CLX EVA 31-Mar-17 26-Oct-06 HP confidential

CLX EVA Resource Parameters
DR Group for which the CLX resource is responsible for All dependent disk resources (Vdisks) must belong to that DR Group This field must contain the full DR Group name including the „\Data Replication\“ folder and is case sensitive EVA – data center location SMA – data center location Data concurrence settings Cluster node – data center location SMI-S communication settings Failover behavior setting Pre/Post Exec Scripts 26-Oct-06

CLX XP Resource Parameters
Device Group managed by this CLX resource All dependent disk resources must belong to that Device Group XP arrays and Raidmanager Library Instances Fence Level settings Cluster node – data center location Some details need to be provided for fast checking (We could discover most of that info on our own at run time but prefer that a customer specifies what disk groups and arrays he really wants to use) Pair/Resync Monitor send messages to EventLog when the CA link failed or somebody suspended the pair in order to alert customer if replication process was stopped for some reason Failover behavior setting CA resync setting Pre/Post Exec Scripts 26-Oct-06

Cluster Arbitration and CLX for Windows

Local Microsoft Cluster – Shared Quorum disk
Traditional MSCS uses Quorum App A Shared application disks store the application data App B Shared Quorum disk keep the quorum log keep a copy of the cluster configuration propagate registry checkpoints arbitration, if LAN-connectivity is lost Quorum App A App B 26-Oct-06

Challenges with dispersed MSCS
Managing data disks Check data disk pairs on failover Allow data disk failover only if current and consistent Managing quorum disk (for a traditional shared quorum cluster) Mirror quorum disk to remote disk array Implement quorum disk pair and keep challenge/defense protocol working as if it is a single shared resource Filter SCSI Reserve/Release/Reset and any necessary IO commands without performance impacts Prevent split-brain phenomena HP’s task list for CLX 26-Oct-06

Majority Node Set Quorum (1)
New Quorum mechanism introduced with Windows 2003 Shared application disks store the application data Quorum data on local disk used to keep a copy of the cluster configuration synchronized by the Cluster Service No common quorum log and no common cluster configuration available => changes to the cluster configuration are only allowed, when a majority of nodes is online and can communicate. App A App B Quorum 26-Oct-06

Slide is animated Majority Node Set Quorum (2) MNS arbitration rule: In case of a failure, the cluster will survive, if a majority of nodes is still available In case of a split site situation, the site with the majority will survive Only nodes which belong to the majority are allowed to keep up the cluster service and could run applications. All others will shut down the cluster service. App A App B Quorum Quorum App A The majority is defined as: (<number of nodes configured in the cluster>/2) + 1 App B 26-Oct-06

Slide is animated Majority Node Set Quorum (3) App B App A Quorum App A App B 26-Oct-06

Slide is animated Majority Node Set Quorum (3) App A App A App B Quorum Quorum # cluster nodes # node failures 2 3 1 4 5 6 7 8 App A App B 26-Oct-06

Majority Node Set Quorum (4) - File Share Witness
What is it? A patch for Windows 2003 SP1 clusters provided by Microsoft (KB921181) What does it do? Allows the use of a simple file share to provide a vote for an MNS quorum-based 2-node cluster In addition to introducing the file share witness concept, this patch also introduces a configurable cluster heartbeat What are the benefits? The „arbitrator“ node is no longer a full cluster member. A simple file share can be used to provide this vote. No single subnet requirement for network connection to the arbitrator. One arbitrator can serve multiple clusters. However, you have to set up a separated share for each cluster. The abitrator exposing the share can be a standalone server a different OS architecture (e.g. a 32-bit Windows server providing a vote for a IA64 cluster) 20 26-Oct-06

Slide is animated Majority Node Set Quorum (5) - File Share Witness \\arbitrator\share Get vote App A App A App B # cluster nodes # node failures 2 3 1 4 5 6 7 8 1 with MNS fileshare witness App A App B 21 26-Oct-06

\\arbitrator\share1 \\arbitrator\share2 Cluster 2 MNS Private Property: MNSFileShare = \\arbitrator\share2 Cluster1 MNS Private Property: MNSFileShare = \\arbitrator\share1 22 26-Oct-06

File Share Witness - Installation & Configuration
Download the update from Install it on each cluster node -> a reboot is required ! This will add a new private property to the MNS resource Configuration Set the MNSFileShare property to the share you created on the arbitrator Command: cluster <clustername> resource <MNSresource> /priv MNSFileShare=\\servername\sharename Important: The account under which the cluster service is running must have read and write permission to the share After setting the property, the MNS resource has to moved to activate the new setting. C:\>cluster . resource MNS /priv Listing private properties for 'MNS': T Resource Name Value S MNS MNSFileShare D MNS MNSFileShareCheckInterval (0xf0) D MNS MNSFileShareDelay (0x4) 23 26-Oct-06

File Share Witness - Prerequisits
Cluster Windows 2003 SP1 & R2 (x86, x64, IA64*, EE and DC) 2-node MNS quorum-based cluster Property will be ignored for >2 node clusters Arbitrator OS requirements Windows 2003 SP1 or later MS did not test earlier/other OS versions even though they should work Server OS is recommended for availability and security File Share requirements One file share for each cluster for which the arbitrator provides a vote 5 MB per share are sufficient The external share does not store the full state of the cluster configuration. Instead, the external share contains only data sufficient to help prevent split-brain syndrome and to help detect a partition-in-time Cluster Service account requires read/write permission For highest availability, you might want to create a clustered file share/file server 24 26-Oct-06 * There is no Windows Server 2003 R2 release for IA64 (Itanium)

File Share Witness - additional parameters
MNSFileShareCheckInterval This is the interval when the cluster service checks if it can write to the file share. If this verification fails, a warning event is logged in the system event log min: 4 sec default: 240 sec max: sec MNSFileShareDelay Delay in seconds that the cluster node (which does not currently own the MNS quorum resource) will wait until it tries to get the vote from the witness. This allows the current owner of the MNS quorum resource be preferred when trying to win the vote. min: 0 sec default: 4 sec max: 60 sec C:\>cluster . resource MNS /priv Listing private properties for 'MNS': T Resource Name Value S MNS MNSFileShare \\arbitrator\share D MNS MNSFileShareCheckInterval (0xf0) D MNS MNSFileShareDelay (0x4) 25 26-Oct-06

File Share Witness/Arbitrator - What does it mean for CLX?
Remember: File Share Witness only works with 2-node clusters Abitrator Node requirement CLX with traditional MNS CLX with MNS using file share witness Cluster Membership Arbitrator is a full additional cluster memberand has full cluster configuration information. Arbitrator is external to the cluster and has only minimal cluster configuration information. Operating System Same Windows version as on other cluster nodes, e.g. if IA64 cluster, abitrator has to be IA64 server, as well. Can be different Windows versions, e.g. 32-bit fileshare witness (arbitrator) can serve 64-bit cluster. Hardware Determined by the OS. Arbitrator could be a smaller, less powerfull machine than the main nodes Fileshare server could be a smaller, less powerfull machine than the main nodes. Due to the less strict OS requirements, the hardware selection is also more flexible. Multiple Clusters One arbitrator node per cluster. One arbitrator can serve multiple clusters. Location 3rd site Network requirements Single subnet. Should NOT be dependent on a network route (physically) through one DC in order to reach the other DC. Can be a routed network (different subnets). 26 26-Oct-06

CLX XP Quorum Filter Service (QFS)
Component of CLX XP for Windows Required for Windows 2000 Optional for Windows 2003 can also use MNS QFS provides some benefits over MNS Functionality Allowing use of Microsoft share quorum cluster across two Data Center and XPs Implements filter drivers that intercept quorum arbitration commands and uses additional CA pairs to make cross site decision „external arbitrator“ for automated failover even in case of full site failure or split 26-Oct-06

CLX XP Quorum Filter Service Components
IO Path Kernel Components User Components Quorum disk pair control Create Recover Swap clusdisk Kernel to user mode communication Windows Service Service Dependencies RESERVE RELEASE RESET Details: filter driver over quorum disk only filters 3Rs and reports them to clx quorum service Filter driver over ScsiPort filters all Resets and reports them to clx quorum service if there were any for the quorum disk Clx quorum service issues appropriate RM API calls for each of the 3Rs ( and other commands like TUR and read/write before Reserve) clxqflt disk clxqsvc RM API clxspflt Quorum CTRL1 CTRL2 CTRL3 RESET TUR scsi/stor port 26-Oct-06

CLX XP on Windows – Server failure
Slide is animated CLX XP on Windows – Server failure Quorum Quorum App A App A App B reserved by right node reserved by left node reserved by right node reserved by left node Quorum CTRL1 CTRL2 CTRL3 Quorum CTRL1 CTRL2 CTRL3 App A App B 31-Mar-17 26-Oct-06 HP confidential

CLX XP on Windows – LAN split
Slide is animated CLX XP on Windows – LAN split Quorum App A App B App B reserved by left node reserved by left node Quorum CTRL1 CTRL2 CTRL3 Quorum CTRL1 CTRL2 CTRL3 App A App B 31-Mar-17 26-Oct-06 HP confidential

CLX XP on Windows – site failure
Slide is animated CLX XP on Windows – site failure External Arbitrator Quorum Quorum App A cdm App A App B reserved by left node reserved by right node reserved by left node Quorum CTRL1 CTRL2 CTRL3 Quorum CTRL1 CTRL2 CTRL3 App A App A App B 31-Mar-17 26-Oct-06 HP confidential

Majority Node Set vs CLX XP Quorum Filter Service
Pro‘s Solution owned by Microsoft Works with both CLX EVA and XP MS prefered solution for geographically dispersed clusters Most likely the CLX solution going forward Shared quorum, hence can survive node failovers down to a single node Allows asymmetric cluster setups Windows 2000 and Windows 2003 Con‘s Requires symmetric node setup For >2 nodes one additional node per cluster is required Will only survive with a majority of nodes Forced majority requires another downtime to reform original cluster Windows 2003 only More intrusive More difficult to maintain across Service Packs and other updates Full site failure or split will first result in cluster service shutdown before external arbitrator kicks in (if quorum disk was in the remote data center) Recommended quorum mechanism for new installs 26-Oct-06

Manual vs Automated failover – failover times

Automated failover Question: „How long does a CLX failover take?“
Answer: „Depends !!!“ CLX failover is first of all still a cluster failover There are components influencing the total application failover time, which are out of control of CLX Failure recognition, cluster arbitration, application startup The CLX component of the total failover time also depends on many factors. 34 26-Oct-06

Factors affecting the automated failover times in a CLX for Windows cluster
Recognize failure Cluster arbitration CLX failover Start application on other node Phase Factors influencing time Time to recognize failure Resource failures resources timeouts (e.g. disk timeouts) potentially resources restarts until the cluster moves the group to another node Node(s), network or whole DC failure Cluster heartbeat timeouts to decide that communication to node(s) is lost Time for cluster arbitration Only happens in case of a node or network failure Time depends on network latency, heartbeat settings and cluster size Time for CLX to failover replication Type of components that fails Node failure Storage failure Management Server failure Intersite communication Time to start application on surviving node Application type and size Required recovery steps (e.g. log replay) (5 sec – 5 min) 35 26-Oct-06

CLX failover times (applies to Metrocluster, as well)
CLX XP Up to 5 Raid Manager XP commands (For typical configs only 3) Default Raid Manager XP command timeout is 30 sec Max. CLX failover time: 5 x 30 sec x (number of remote hosts + 1) Quorum Filter Service controls quorum disk pair failover in < 3 seconds If arbitrator assistance is needed cluster restart might take up to 6 minutes CLX EVA Theoretically Up to 12 SMI-S calls (7 if the CA link is up) Default SMI-S timeout is 180 sec Realistically, 12 (7) x 180 sec will never happen: If CLX times out, it will try the next management server, where it either succeeds or also fails, which will result in a CLX resource failure Realistically, you can expect up to 5 minutes In case of a simple node failure, application move with CLX failover will take just a few (5-20) seconds longer than without remote replication. XP – depends on fence level, auto recovery, etc. typical Fence Level never needs 3 commands RM XP timeout can be tuned down to seconds 36 26-Oct-06

Manual vs. automated failover times
Typical example of a manual failover is a Stretched cluster Cluster stretched across two sites, but all nodes accessing the same array which replicates the data to a remote partner. Even in case of a node failure the primary storage array will be used (across a remote link if node is in remote data center) A storage or site failure will bring down the cluster requiring manual intervention to start cluster from remote array. Steps involved in case of a storage failure Notification of operator (15 min*) Evaluation of the situation and necessary recovery steps (30 min*) Shutdown surviving cluster node (5 min*) Replication failover (2 min*) Start up surviving cluster nodes (10 min*) The effort mulitplies with the number of application/clusters/arrays being affected *times are just examples and will vary depending on the situation and setup. A full site disaster for instance might involve much more troubleshooting and evalution time. 37 26-Oct-06

Manual vs. automated failover times - single cluster
Notification of operator Evaluation of the situation and necessary recovery steps Shut down Failover Start Servers manual Total = 62 min* automated (CLX) Total = 10 min* Application startup Recognize failure CLX failover Cluster arbitrator time (min) 5 10 15 20 25 30 35 40 45 50 55 60 65 *times are just examples and will vary depending on the situation and setup 38 26-Oct-06

Manual vs. automated failover times - multiple clusters
Total = 96 min* cluster 1 manual cluster 2 cluster 3 cluster 1 automated (CLX) cluster 2 Total = 10 min* cluster 3 time (min) 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 *times are just examples and will vary depending on the situation and setup 39 26-Oct-06

Other advantage CLX helps to avoid human mistakes
Manual failover operations introduce the risk of making mistakes, failing over the wrong DR Group, etc. CLX simplifies planned failover for maintenance Similar to disaster failover, just faster Manual failover still requires same steps besides the notification and evaluation Failback is as simple as a failover Once the primary site is restored, it‘s just another cluster failover Manual failback is as complex and intrusive as maintenance failover 40 26-Oct-06

External information Cluster Extension EVA Cluster Extension XP
Cluster Extension XP Link to Metrocluster EVA Link to Metrocluster XP Disaster-Tolerant Solutions Library (Solutions for Serviceguard) Continental Cluster CA EVA CA XP CLX EVA migration whitepaper and Exchange replication whitepaper (incl. CLX) 26-Oct-06

Internal information CLX EVA for Windows Sharepoint (maintained by Till Stimberg) CLX XP for Windows Sharepoint (maintained by Anton Vogel) CLX/Metrocluster streams on SPOCK 42 26-Oct-06

Cluster Extension for XP and EVA

Similar presentations

Presentation on theme: "Cluster Extension for XP and EVA"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Cluster Extension for XP and EVA

Similar presentations

Presentation on theme: "Cluster Extension for XP and EVA"— Presentation transcript:

Similar presentations

About project

Feedback