Presentation is loading. Please wait.

Presentation is loading. Please wait.

High availability and Disaster Recovery in a Multi-Site Virtual Environment using virtualization Henk Den Baes Technology Advisor Microsoft BeLux.

Similar presentations


Presentation on theme: "High availability and Disaster Recovery in a Multi-Site Virtual Environment using virtualization Henk Den Baes Technology Advisor Microsoft BeLux."— Presentation transcript:

1 High availability and Disaster Recovery in a Multi-Site Virtual Environment using virtualization Henk Den Baes Technology Advisor Microsoft BeLux

2 HA & DR with Multi-Site Clustering Introduction Networking Storage Quorum

3 Session Objectives Session Objective(s): – Clustering is not too expensive and not that complex – Understanding the need and benefit of multi-site clusters – What to consider as you plan, design, and deploy your first multi-site cluster Clustering your Hyper-V servers is a great solution for not only high availability, but also disaster recovery

4 Hyper-V Virtualization Scenarios Business Continuity Dynamic Datacenter Server Consolidation Test and Dev

5 Business Continuity Resumption of full operations combining People, Processes and Platforms Disaster Recovery Site-level crisis, data and IT operations resumption Backup and Restore Presumes infrastructure is whole 97% is file/small unit related High Availability Presumes that the rest of the environment is active Keeping the Business Running

6 VHDVHD Shared Storage CSV Backup/Recovery Secondary Site Primary Site Storage Array Virtualization reduces BC costs and minimizes business downtime by: increasing the availability of infrastructure increasing the availability of infrastructure extending protection to more applications extending protection to more applications simplifying backups, recovery and DR testing simplifying backups, recovery and DR testing Virtualization reduces BC costs and minimizes business downtime by: increasing the availability of infrastructure increasing the availability of infrastructure extending protection to more applications extending protection to more applications simplifying backups, recovery and DR testing simplifying backups, recovery and DR testing Business Continuity High Availability Disaster Recovery Backup and Recovery Disaster Recovery Backup/Recovery ClusteringClustering Quick/Live Migration Business Continuity with Virtualization

7 Two node single-site cluster SAN Secondary SitePrimary Site SAN SAN Storage Array Secondary Storage Array Primary Storage Array Single-site Cluster Multi-site Cluster WAN Connectivity Differences Between Single-site & Multi-site Clusters VMs move between nodes on the same SAN and share common storage VMs move between physical nodes on different SANs and without true shared storage between the sites SAN Replication

8 Multi-site stretch configurations can provide automatic fail-over Secondary Site Primary Site Storage Array Geographically distributed clusters are extended to different physical locations Stretch clustering uses the same concept as local site clustering Storage array or third party software provides SAN data replication Geographically distributed clusters are extended to different physical locations Stretch clustering uses the same concept as local site clustering Storage array or third party software provides SAN data replication Stretch Clustering automatically fails VMs over to a geographically different site Replicated data from site A Primary site data is replicated to the secondary site Microsoft Stretch Clustering & Storage Continuity

9 Benefits of a Multi-Site Cluster Protects against loss of an entire datacenter Automates failover – Reduced downtime – Lower complexity disaster recovery plan Reduces administrative overhead – Automatically synchronize application and cluster changes – Easier to keep consistent than standalone servers The primary reason DR solutions fail is dependence on people

10 DR: NMBS VDI use case - NOC Windows 7 master NOC is installed on 1 site – DRP is costly & has to be tested yearly – There is no automatic app. Sync – Dedicated master, manual upgrades – No persistent image, need for admin rights Solution: – Remote Desktop Services – Hyper-V R2 – Remote Desktop Connection

11 HA & DR with Multi-Site Clustering Introduction Networking Storage Quorum

12 Network Considerations Network Deployment Options: 1.Stretch VLANs across sites 2.Cluster nodes can reside in different subnets Site A Public Network 10.10.10.1 20.20.20.1 30.30.30.1 40.40.40.1 Redundant Network Site B

13 Stretching the Network Longer distance traditionally means greater network latency Missed inner-node health checks can cause false failover Cluster inner-node heartbeating is fully configurable SameSubnetDelay (default = 1 second) – Frequency heartbeats are sent SameSubnetThreshold (default = 5 heartbeats) – Missed heartbeats before an interface is considered down CrossSubnetDelay (default = 1 second) – Frequency heartbeats are sent to nodes on dissimilar subnets CrossSubnetThreshold (default = 5 heartbeats) – Missed heartbeats before an interface is considered down to nodes on dissimilar subnets Command Line: Cluster.exe /prop PowerShell (R2): Get-Cluster | fl *

14 Updating VMs IP on Subnet Failover On cross-subnet failover, if guest is… Best to use DHCP in guest OS for cross-subnet failover IP updated automatically DHCP Admin needs to configure new IP Can be scripted Static IP

15 Client Reconnect Considerations Nodes in dissimilar subnets VM obtains new IP address Clients need that new IP Address from DNS to reconnect Record Updated 10.10.10.111 20.20.20.222 DNS Server 1 DNS Server 2 DNS Replication Record Created Record Updated VM = 20.20.20.222 Site A Site B

16 Solutions Solution #1: Prefer Local Failover – Scale up for local failover for higher availability No change in IP addresses for HA Means not going over the WAN and is still usually preferred – Cross-site failover for disaster recovery Solution #2: Stretch VLANs – Deploying a VLAN minimizes client reconnection times IP of the VM never changes Solution #3: Abstraction in Network Device – Network device uses 3rd IP – 3rd IP is the one registered in DNS & used by client

17 HA & DR with Multi-Site Clustering Introduction Networking Storage Quorum

18 Storage in Multi-Site Clusters Different than local clusters: – Multiple storage arrays – independent per site – Nodes commonly access own site storage – No true shared disk visible to all nodes Site B Site A Site B

19 Storage Considerations Changes are made on Site A and replicated to Site B Requires data replication mechanism between sites Site A Site B Site ASite B Replica

20 Synchronous Replication Host receives write complete response from the storage after the data is successfully written on both storage devices Primary Storage Seconda ry Storage Write Complet e Replication Acknowled gement Write Request

21 Seconda ry Storage Write Complet e Write Request Replication Asynchronous Replication Host receives write complete response from the storage after the data is successfully written to just the primary storage device, then replication Primary Storage

22 Synchronous vs. Asynchronous SynchronousAsynchronous No data lossPotential data loss on hard failures Requires high bandwidth/low latency connection Enough bandwidth to keep up with data replication Stretches over shorter distances Stretches over longer distances Write latencies impact application performance No significant impact on application performance

23 Hardware Replication Partners Hardware storage-based replication

24 Software Replication Partners Double-Take Availability SteelEye DataKeeper Cluster Edition Symantec Storage Foundation for Windows Software host-based replication

25 Storage Virtualization Abstraction Some replication solutions provide complete abstraction in storage array Servers are unaware of accessible disk location Fully compatible with Cluster Shared Volumes (CSV) Site B Site A Virtualized storage presents logical LUN Servers abstracted from storage

26 Focus on Double-Take for Hyper-V Product Features – Host level filter driver replication – Simplified management Auto discovery, guest level policies, & guest protection schema Not a file level protection product (block based) One click failover and failover management – WAN support (bandwidth throttling, compression…) – Integration with SCOM and SCVMM All managed via one familiar console Licensed per Hyper-V Host – Unlimited number of VMs

27 Basic Double-Take Configuration

28 Hyper-Vs File-Based Data *.VHD – Files that hold all the data for a particular volume associated with a virtual machine *.AVHD - When you take a snapshot of a virtual machine, its.VHD files are frozen and subsequent disk writes within the VM are instead stored in a.AVHD file *.XML - Hyper-V stores a virtual machines configuration information in an industry-standard XML file *.BIN and *.VSV - When you pause or take a snapshot of a running virtual machine, Hyper-V stores the contents of the virtual machines RAM in a.BIN file and information about the current state of the running virtual machine in a.VSV file

29 How Double-Take Replication Works Operating System Hardware Layer File System Applications Operating System Double- Take Filter Hardware Layer File System Applications Initial Mirror of Data WAN Optimized Three Levels of Data Compression and Scheduled Bandwidth Limiting Capabilities Any IP Network

30 Host-Level Protection for Hyper-V Hyper-V Host VHD Hyper-V Host

31 Integrates with Microsoft Failover Clustering Uses Double-Take Patented Replication Extends Clusters Across Geographical Distances Eliminates Single Point of Disk Failure Double-Take GeoCluster

32 How Double-Take GeoCluster Works GeoCluster nodes use separate disks, kept synchronized by real-time replication Only the active node accesses its disks At failover, the new active node resumes with current, replicated data Data is replicated to all passive nodes Replication

33 GeoCluster for Hyper-V Workloads Product Features – Provides redundancy of storage – Allows cluster nodes to be geographically distributed – Utilizes GeoCluster technology to extend Hyper-V clustering across virtual hosts without the use of shared disk – Replicates cluster data to a secondary node, eliminating single point of failure – Allows manual and automatic moves of cluster resources between virtual hosts

34 CSV with Replicated Storage Site B Site A VHD Read/OnlyRead/Write VM attempts to access replica Traditional architectural assumptions may collide… – Traditional replication solutions typically assume only 1 array accessed at a time – Cluster Shared Volumes assumes all nodes can concurrently access a LUN Talk to your storage vendor for their support story

35 HA & DR with Multi-Site Clustering Introduction Networking Storage Quorum

36 Quorum Overview Disk only (not recommended) Node and Disk majority Node majority Node and File Share majority Vote Majority is greater than 50% Possible Voters: Nodes (1 each) + 1 Witness (Disk or File Share) 4 Quorum Types

37 Replicated Disk Witness A witness is a tie breaker when nodes lose network connectivity – When a witness is not a single decision maker, problems occur Do not use in multi-site clusters unless directed by vendor Replicated Storage ? Vote

38 Node Majority Site B Site A Cross site network connectivity broken! Can I communicate with majority of the nodes in the cluster? Yes, then Stay Up Can I communicate with majority of the nodes in the cluster? Yes, then Stay Up Can I communicate with majority of the nodes in the cluster? No, drop out of Cluster Membership Can I communicate with majority of the nodes in the cluster? No, drop out of Cluster Membership 5 Node Cluster: Majority = 3 Majority in Primary Site

39 Node Majority Disaster at Site 1 Can I communicate with majority of the nodes in the cluster? No, drop out of Cluster Membership Can I communicate with majority of the nodes in the cluster? No, drop out of Cluster Membership Majority in Primary Site 5 Node Cluster: Majority = 3 Need to force quorum manually Site A We are down! Site B

40 Forcing Quorum Forcing quorum is a way to manually override and start a node even though it has not achieved quorum – Always understand why quorum was lost – Used to bring cluster online without quorum – Cluster starts in a special forced state – Once majority achieved, drops out of forced state Command Line: – net start clussvc /fixquorum (or /fq) PowerShell (R2): – Start-ClusterNode –FixQuorum (or –fq)

41 Multi-Site with File Share Witness Site ASite B Site C (branch office) SCENARIO: Complete resiliency and automatic recovery from the loss of any 1 site \\Foo\Share WAN File Share Witness

42 Multi-Site with File Share Witness \\Foo\Share WAN SCENARIO: Complete resiliency and automatic recovery from the loss of connection between sites Can I communicate with majority of the nodes in the cluster? No (lock failed), drop out of Cluster Membership Can I communicate with majority of the nodes in the cluster? No (lock failed), drop out of Cluster Membership Site BSite A Can I communicate with majority of the nodes (+FSW) in the cluster? Yes, then Stay Up Can I communicate with majority of the nodes (+FSW) in the cluster? Yes, then Stay Up Site C (branch office)

43 File Share Witness (FSW) Considerations Simple Windows File Server Single file server can serve as a witness for multiple clusters – Each cluster requires its own share – Can be made highly available on a separate cluster Recommended to be at 3rd separate site to enable automatic site failover FSW cannot be on a node in the same cluster FSW should not be in a VM running on the same cluster

44 Quorum Model Recap Even number of nodes Best availability solution – FSW in 3rd site Node and File Share Majority Odd number of nodes More nodes in primary site Node Majority Use as directed by vendor Node and Disk Majority Not Recommended Use as directed by vendor No Majority: Disk Only

45 Datacenter Recovery Partners Citrix Essentials for Hyper-V augments Hyper-V DR by automating disaster recovery configuration – StorageLink Site Recovery manages storage automation – Workflow orchestration for VM site failover – Non-disruptive testing & staging of VM prior to failover – Single click failback – Recovery plans – Integrates with SCVMM – Plus more…

46 Microsoft Site Recovery Solution Stack End to end Disaster Recovery Storage and Data Availability Server and Application Availability Hyper-V Clustering Quick and Live Migration Synchronous & Asynchronous Replication Array state and application restart Workflow automation DR Run-book Simplified configuration & testing Management Storage Partner Data Replication Automation Physical and Virtual Performance and Resource Optimization

47 Microsoft Private Cloud – Server Platform Simplify with integrated physical, virtual and cloud management Improve agility with private cloud computing infrastructure Optimize service delivery across datacenter infrastructure and business critical services Lower costs through automation We dont have to manage our infrastructure with multiple tools…we have one central monitoring and management console from which we can care for every aspect of our environment - Doug Miller, Practice Architect, Microsoft Practice Group, CDW Design, Configure & Deploy Data Protection & Recovery Virtualize, Deploy & Manage

48 © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.


Download ppt "High availability and Disaster Recovery in a Multi-Site Virtual Environment using virtualization Henk Den Baes Technology Advisor Microsoft BeLux."

Similar presentations


Ads by Google