Presentation is loading. Please wait.

Presentation is loading. Please wait.

Disaster Recovery with VMware Infrastructure

Similar presentations


Presentation on theme: "Disaster Recovery with VMware Infrastructure"— Presentation transcript:

1 Disaster Recovery with VMware Infrastructure
VMware Infrastructure for Rapid, Reliable, and Cost-Effective Disaster Recovery

2 Agenda Challenges of Traditional DR
Properties of Virtualization for DR Using VMware Virtualization in DR SRM Technical Overview As you know DR is a part of business continuity planning, which is a broad topic. In this presentation, we will not be able to cover all aspects of business continuity but will be focusing specifically on providing an overview of what VMware technology brings to disaster recovery solutions. Specifically, we’ll be discussing the following: (read slide bullets)…..

3 What We Hear…Is This Familiar?
“ We don’t have a DR plan for mission critical x86 systems – it would be too expensive and complex” Only 31% of CIO’s surveyed rate their plans as extremely or very effective (IDG) “ It is very difficult to test our DR plan because of all the extra hardware, configuration and special processes” [Note: slide is animated with clicks. Left side BOX comments are what customers are expressing as their pain. Right side BOX comments are analyst opinion based on studies.] This is a sampling of customer comments and survey results that are very telling about the state of DR. If it is an expensive mainframe based application or a high-end Unix based application you have those mostly covered. FIGURE - The above figure highlights that x86 servers are proliferating across data centers and many mission critical applications are now running on x86 platforms. This trend (running mission critical on x86) will continue to increase and with Service Oriented Architectures (SOA) it will mean x86 will require the same level of protection reserved for high-end applications (unix, mainframe). This is a challenging situation and a better lower cost approach to DR needs to evolve. VMware software radically changes the dynamics of DR. 40% of all companies that experience a major disaster will go out of business if they cannot gain access to their data within 24 hours (Gartner) “ In our last disaster recovery test we missed our recovery objectives by days”

4 Lack of a reliable disaster recovery plan
DR Pain Points Lack of a reliable disaster recovery plan 27-30% of business have no disaster recovery plan (VMworld, Imation) Inability to meet RTO and RPO requirements with current plan Business needs and/or regulatory needs Need to improve RTO from days to minutes or hours Need to improve RPO from 24 hours to 1 hour or less Idle hardware at recovery site Unable to instantly repurpose machines at the secondary site Management effort required to maintain recovery site Need to maintain system and application images at secondary site Usually only data is regularly and cleanly updated

5 Expensive and Complex Challenges of Traditional DR: Infrastructure
DR Challenges Today Prod DR Application Bound to HW 5-10% utilized Application OS OS WAN x86 x86 OS files OS files local storage local storage Storage Storage Expensive and Complex Multiple slow processes to transfer data to DR site for OS, application installation, configuration, data files Requires 1:1 duplication of servers and infrastructure at DR site Makes x86 physical DR strategies complex and expensive In traditional physical world paradigm, OS Bound to Hardware Configuration, App Bound to OS Note: 2 Bullets near the production site are about the constraints in the physical world. In the physical world the OS gets tied to the hardware; apps get tied to the OS The identical DR site requirements between can be very restrictive…you’ll need the same server model, same hardware configuration, same firmware, same storage configuration, etc because of the dependencies (restrictive!) between hardware and software mentioned earlier. Also I’d like to briefly mention that the servers on the DR site are often unused (idle) but still take up all the resources – power, network, storage, space Definitely not a good way to spend valuable resources.

6 Slow and Unreliable Process
Challenges of Traditional DR: Recovery DR Challenges Today Prod DR “Boot & Pray” cd, tape or ghost image Application WAN Application OS OS x86 x86 OS files OS files local storage local storage Storage Storage Slow and Unreliable Process Complex to physically recover OS, applications & data Separate processes for system and application data OS & applications have dependencies on hardware configuration Tier 2 & 3 applications left unprotected, adding to Tier 1 RTO risk So let’s take the physical-physical discussion one step further --- For your OS and application binary files -- you’ll typically use CD, tape or images such as Ghost images For your application data – you’ll probably use replication technology (SAN-SAN, server-to-server based replication) or backup software. SAN-based replication can provide high-performance replication but at a cost of expensive $$$ SAN array licenses Given all this complexity, expense and uncertainty – it’s no wonder than quite often DR is provided only for a few select Tier-1 applications while other important Tier-2,3 apps are left without full protection. However, the Tier-1 applications may be dependent on those Tier-2,3 servers. This approach certainly exposes your business critical IT to significant downtime risk. Identical and specific hardware is constraining your choices and impacting your budget decisions Too much dependency on hardware configurations Recovery requires complex and lengthy setup process (ghost, tape based, bare metal recovery on DR site) and separate hardware. If using tape: tape recovery can be really long and errors happen often. Finding tape can also be a nightmare if cataloging is not up to date. DR site is idle or under-utilized because it is reserved for a failover (disaster) and running other applications would require comprehensive and long set-up process At VMware we believe there is a better way for IT – that leads to Reliable, Rapid and Cost-Effective Disaster Recovery

7 Agenda Challenges of Traditional DR
Properties of Virtualization for DR Using VMware Virtualization in DR SRM Technical Overview Now we will look to the virtualization benefits that make it a key enabler for better DR

8 DR : The Killer App for Virtualization!
Press “Best Disaster Recovery Product of 2006” (TechTarget) Customers 55% of customers using virtualization for BC/DR* 55% So VMware is just about server consolidation, right? Let me share some findings…here are some recent findings and opinions that our customers (VMware Customers) and the press have to say about VMware and disaster recovery. Right Box (CUSTOMER) – 55% of the 2265 customers we recently surveyed across US and Canada – are actually using VMware for BC-DR!!! Left Box (PRESS – Tech Target) – Here is a award we received from a a leading technology media company (Tech Target) –we were voted by their panel as the # 1 Disaster Recovery product for 2006! (side note: we do not even have a fully packaged DR solution or a SKU but Tech Target’s readership voted it the GOLD award winner!) BOTTOM BOX (CUSTOMER)– read box …VMware is being used for critical applications that need DR protection: 85% of VMware customers (from a blind sample of 2265 VMware customers) assert that they are using VMware in production environments – what’s even more telling is that 43% use it as a default in their production environments 2006 Customer Survey (n=2265) …85% use VMware in production; 43% set as a default policy for production servers* *Source: VMware customer survey, 9/ N=2265

9 What is Server Virtualization
VMware server virtualization packages hardware, OS, and applications into a portable virtual machine package Before Virtualization After Virtualization Before we move ahead let’s take a few minutes to review the basics of virtualization. By way of comparison, let’s think about the typical x86 (Intel or AMD processor) server as shown on the left. Each server has a hardware configuration, an operating system installed on the hardware, and applications installed in the operating system. Each layer of this architecture is pretty tightly tied to the layer below and ultimately to the hardware—the OS is customized to the hardware in the drivers it uses, the way you tune its parameters, etc. Applications are in turn tied to the OS that they’re installed on. What are the implications? One, you have only one OS and pretty much one workload per physical machine. It’s very difficult to put more than one major application on these servers because you risk running into conflicts and performance problems if you do. The result: utilization is very low—you’re paying for computing power that’s going to waste. On top of that, this architecture is pretty inflexible. How long would it take to repurpose an idle server for something else? Well, you’d need to archive the current OS and application, reinstall the OS, install the application, and so on. Virtualization changes all of that. It takes a physical system along with the operating system and anything installed in the operating system and packages them into what we call a “virtual machine”. A virtual machine contains a virtual hardware configuration as well as the operating system, applications, and data. To the OS and applications in a virtual machine, it’s no different than running on real physical hardware. Each virtual machine runs on top of this thin virtualization layer that VMware software places on a server. This virtualization layer takes care of transparently allocating physical resources to each virtual machine. Software tied to hardware Single OS image per machine One application workload per OS Multiple workloads per machine Software independent of hardware System, data, apps are files

10 VMware Virtualization Enablers for DR
Hardware Independence Eliminate need for 1:1 hardware duplication for DR Eliminate risk of hardware “configuration drift” Re-use older servers for DR Unlike many solutions that require specific vendor hardware say for clustering or replication, VM’s can run on any x86 hardware without requiring any changes or modifications. This significantly minimizes the system configuration issues for the DR site that we mentioned earlier. Hardware independence also translates to significant cost savings. You can eliminate the risk of “configuration drift”, i.e. the risk that two sites become non-identical over time when changes at one site aren’t made to the other site in lock step. You don’t have to worry about forgetting to update firmware at both sites at the same time, of making sure you add memory to systems at both sites at the same time, etc. You can use your older server assets as a part of your DR site so that you may be able to provide hardware for recovery without needing to buy new servers Run a virtual machine on any server without modification Copyright © 2006 VMware, Inc. All rights reserved. 10

11 VMware Virtualization Enablers for DR
Encapsulation Simplify backup and replication Simplify copying and cloning of systems Simplify provisioning System = files in VFMS Apps Data Physical Server Encapsulation: Entire server – OS, apps, data, devices, and state – is now simply a file. This significantly simplifies tasks such as server migration which can be treated as a simple data migration. Server provisioning is similar to copying a file. Implications: Backup and replication is easier because you only need to backup or replicate a small set of files to ensure that you’ve completely protected a server Server cloning/copying is as simple as copying a few files Because system state and application data is as simple as moving a file – provisioning of DR machines is really easy and accelerated. No need to build an image from scratch, or use multiple tools for recovery of system state and configuration Encapsulate entire systems in simple files Copyright © 2006 VMware, Inc. All rights reserved. 11

12 VMware Virtualization Enablers for DR
Isolation Provide easier testing of DR plan Utilize DR hardware for other tasks Leverage resource pools to separate workload groups DR Test Batch Job VMware Infrastructure OS App Isolation: Isolation is a core property of a VM. Any changes or instability in one VM is completely isolated from the other VM’s on the host. Ideal for test and dev and production environments requiring various application and OS levels running within the same host. Because systems are isolated it significantly enhances security when for example vm’s are used as a means to deliver a virtual desktop on an unmanaged PC. Implications Simplify DR testing – because virtual machines are isolated from each other you can run your DR tests on the actual DR recovery hardware without impacting your ability to recover production virtual machines should your production site fail during a DR test Better utilization of DR hardware – Because of the isolation properties and the ability to use virtual LAN’s (more later) we can use the hardware designated for recovery for other workloads when the production site is functioning normally. For example DR test can be run on the DR site alongside a simultaneous test-dev or batch program workload. Resource pooling - resource pooling isolates the performance impact of virtual machines on each other by allocating resources to a group of virtual machines based on user-specified limits. Using resource pools, you can ensure that different workloads, e.g. DR testing workloads, batch jobs, actual recovery virtual machines, etc. do not interfere with each other. Each virtual machine is isolated from other virtual machines Copyright © 2006 VMware, Inc. All rights reserved. 12

13 VMware Virtualization Enablers for DR
Partitioning Consolidate servers Boost utilization Provide significant cost savings % Utilization Partitioning/Consolidation: Another key property of virtual machines is that multiple VM’s can safely run simultaneously on a host with the virtualization layer transparently allocating system resources to each. This significantly drives up utilization of servers. Implications: Significantly reduce spending on hardware for disaster recovery, i.e. fewer servers needed, both for production and for recovery Increase server utilization Provide significant cost savings that can be used for other critical projects (e.g. for DR) Safely run multiple virtual machines simultaneously on a single physical server Copyright © 2006 VMware, Inc. All rights reserved. 13

14 Challenges of Traditional DR Properties of Virtualization for DR
Agenda Challenges of Traditional DR Properties of Virtualization for DR Using VMware Virtualization in DR Data and system protection Replication DR testing Protecting physical servers with virtual machines SRM Technical Overview This section coming up will now highlight and talk about various architectures for backup and recovery, replication, DR testing and also show you how VMware can protect your physical server environment from failures. We will also show you how easy it is to test DR.

15 VMware Availability Products And Features
Avoid planned outages Quick recovery from unplanned outages Component Server Storage Data N/A Site NIC Teaming, Multipathing VMotion, DRS + Maintenance Mode VMware HA Storage VMotion Encapsulation, VCB Encapsulation, VCB Encapsulation, boot from shared storage, instant reprovisioning, HW independence, resource pools, snapshots, VLANs VMware Site Recovery Manager

16 Data and System Protection – Physical vs. Virtual
Data and system protection with physical infrastructure Separate processes for protecting data and system disks Require identical hardware for guaranteed restore Complex processes to ensure protection System System configuration Data Data and system protection with VMware Infrastructure Same process for data and system disks Entire system stored as data Hardware-independent virtual machines are easy to restore to any hardware Before we move into details let’s spend a minute talking about how virtualization impacts the backup and recovery process; backup and recovery will be a part of the DR plan for many environments using virtual machines In a physical environment some of the challenges of backup and recovery include: You typically use separate processes to protect your OS (system) disks and your application data disks because they typically reside on different storage; this adds complexity and time to your overall backup process To guarantee successful restore, you would need to have identical hardware due to hardware dependencies. The entire process is extremely complex—you need to spend time rebuilding the OS, validating that your application works after recovery, etc. Virtualization can eliminate many of those challenges: Because VMware virtualization encapsulates both system and data disks as files, it’s easier to use the same process to backup and restore both system and data disks; because it makes it easy to put both system and data disks on shared storage, it’s also easier to use the same process to protect both Because of encapsulation, the entire system is stored as data, making it possible to restore not only data but also entire systems just by restoring a few files Because of the hardware independence of virtual machines, you can be certain that you’ll be able to successfully boot and use a system that’s been backed up—no more failures due to driver or other hardware dependencies System, data, system config

17 Backup Options with VMware – Reduce Backup Windows
Backup Agent Service Console Backup Agent Service Console App OS App OS OS Backup Agent tape Backup Server In-VM In-Console VCB Agent in each VM Same architecture as physical system backup File-level incremental backup possible Any storage Agent in Service Console Simplified backup of full-disk images Any storage Consolidated Backup - Agent on Proxy Server Move backup out of VM Provide LAN-free backup Eliminate backup windows Pre-integrated with 3rd party backup products There are three ways that our customers commonly do backup and recovery in virtualized environments. These are: In-VM, Put backup agent in virtual machine: same configuration and procedure as with physical machines Provides file-level backup and restore In-Console Backup from service console: backs up entire virtual machine by backing up the small number of files that encapsulate it Simpler, doesn’t load individual VM’s, but less granular restore since you can only restore full disk images, not individual files within virtual machines Many customers use in-VM and in-Service Console backup as complementary options—in-VM for file-level backup of data and in-Service Console for image-level backup of system disks VMware Consolidated Backup (Next Page we will show how it works) Consolidated Backup, available in VI3, provides another option that has significant benefits. Key Points: What is it? Centralized agent-less backup for virtual machines Pre-integrated with major 3rd-party backup products VCB does not do backups itself, but is rather an enabler that makes it easier to do backups using standard 3rd-party backup tools Customer Impact Reduce the load on ESX Server, thereby allowing it to run more efficiently and to run more virtual machines. Perform backups at any time, even in the middle of the day Improve manageability of IT resources by using a single agent running on the proxy server rather than an agent on every virtual machine. Eliminate backup traffic on the local area network

18 VMware Consolidated Backup – How it Works
Centralized file and image level backup 1 Take VM Snapshot Mount SAN Snapshot Backup files or disk images with leading backup tools 2 3 Move backup out of the virtual machine Run midday backups – LAN Free Integrated with 3rd party backup How does VMware Consolidated Backup work? Consolidated Backup provides a set of drivers and scripts that enable LAN-free backup of virtual machines from a centralized Microsoft® Windows 2003 proxy server using industry-standard backup software. Consolidated Backup includes pre-backup and post-backup scripts for integration with most major backup providers. A backup job is created for each virtual machine and that job is dispatched on a Consolidated Backup proxy. For virtual machines running Microsoft® Windows operating system, the pre-backup script quiesces NTFS inside the virtual machine (ensuring that any application activity is paused before taking a point-in-time snapshot), takes a virtual machine snapshot, and mounts the snapshot to the proxy server directly from the SAN. The backup client then backs up the contents of the virtual machine—either as a set of files and directories or as a virtual disk image. Finally, the post-backup script tears down the mount and takes the virtual disk out of snapshot mode. Copyright © 2005 VMware, Inc. All rights reserved. 18

19 Replication with VMware: Array-Based Replication
DR SITE PRIMARY Site Failure WAN or Dark Fiber Storage array Storage array Array-Based Replication [Note to presenter : slide is automatically animated to illustrate key concepts.] For demanding performance requirements and lowest RTO/RPO, consider off-host replication via array-based or storage network-based replication. We will be discussing array based replication technology here – where the replication load is now being handled by the storage processors within the storage arrays on the primary and DR sites (connected by dark-fiber or a WAN) Array-based replication allows you to easily replicate entire virtual machines simply by replicating the storage disks on which they reside—virtual machines are just like any other files residing on a disk Most major array vendors (including EMC, NetApp, IBM, HP, Hitachi, Equallogic, LeftHand, etc.) provide some form of array-based replication for use with their arrays. Target VMFS Source VMFS

20 Simpler Disaster Recovery Testing with Virtualization
DR Site Snapshot and clone replicated data to create testing VM’s Connect test VM’s to an isolated network Power up testing VM’s to validate recovery Delete VM clones used for testing 15 GHz 9 GHz Powered On DR VMs Test DR Live DR Replicated Data SAN Snapshot Virtualization helps to make disaster recovery testing significantly easier than it is in a physical environment. In a physical environment you usually need to spend a lot of time and effort allocating additional hardware for testing, setting and configuring that hardware, and making sure that the testing environment won’t impact the actual recovery hardware. Once the test is completed you need to spend time cleaning up after the test. With VMware Infrastructure you can perform DR tests without requiring additional hardware for testing, without needing additional hardware and with significantly less time and effort needed to set up and configure the DR test. For testing, a snapshot of the replicated data is taken and then cloned to create a new virtual machine Virtual switches and VLAN’s can be used to create an isolated testing network that doesn’t interfere with other networks at the DR site You can now power up the testing VM’s and start validating your DR plan Once testing is over, to clean up you merely need to delete the VM clones and you’re ready for the next test. Target VMFS - Rapid DR setup and removal - Dual-use of DR site for batch, test and other workloads OS.image Appln.image Data.image OS Application Data.

21 Recovery Process in a Virtualized Environment
Example recovery process comparison 40+ hrs P-P Configure hardware Install OS Configure OS Install backup agent Start “Single-step automatic recovery” Restore VM Power on VM V-V < 4+ hrs Comparing an example of the steps necessary to recover from a disaster in the physical to physical recovery scenario versus the physical to virtual scenario illustrates the dramatic difference that virtual infrastructure makes in simplifying and shortening the recovery process. With a physical recovery target, time to recovery is much longer—40+ hours in one customer example. In the same customer example, time to recovery using a virtual target has been reduced to 3+ hours and several fewer steps. Since the hardware is now virtualized, we no longer need to spend time verifying that we have the proper hardware configuration (or potentially even locating adequate hardware) for recovery—any hardware is a potential recovery target. RTO of minutes to a few hours, not days to weeks!

22 VMware Site Recovery Manager: Technical Overview
July 2008 VMware

23 Agenda Introduction and Key Concepts
Site Recovery Manager 1.0 Prerequisites and SAN Integration Site Recovery Manager Workflows Site Recovery Manager Roles and Privileges Alarms and Site Status Monitoring Summary Agenda for the presentation today. 1. DR and SRM Introduction and Concepts 2. SRM 1.0 Prerequisites and SAN Integration 3. SRM Workflows (Protected and Recovery Site) 4. SRM Roles and Privileges 5. SRM Alarms and Site Status Monitoring 6. SRM Core benefits and Summary

24 What is a Disaster? Complete loss of a data center for an extended period of time Declaration of a disaster usually requires consensus from multiple parts of the organization (at the C*O level) What is not a disaster? Failure of an individual host A temporary service interruption This is what we mean when we talk about disasters. There is of course a gray area—does failure of a storage array constitute a disaster? How long is “an extended period?” Can disaster recovery tools assist with planned outages?

25 The Current State of Physical Disaster Recovery
Tier RPO RTO Cost I Immediate $$$ II 24+ hrs. 48+ hrs. $$ III 7+ days 5+ days $ DR services tiered according to business needs Physical DR is challenging Maintain identical hardware at both locations Apply upgrades and patches in parallel Little automation Error-prone and difficult to test In our discussions with customers like you, we found that disaster protection for services tends to be tiered. In the first tier are services for which no downtime at all can be tolerated; those services tend to be deployed from the start in active-active configurations. For those sites that maintain identical idle hardware at the secondary site, just keeping up with OS patches can be a full-time job. For the remaining services, however, DR plans are in the three-ring binder (“read this in case of disaster”): To recover this server, confiscate that hardware, re-install the OS, recover from tape. Of course, these steps are very manual, and tend to be very difficult to test.

26 Advantages of Virtual Disaster Recovery
Virtual machines are portable Virtual hardware can be automatically configured Test and failover can be automated (minimizes human error) The need for idle hardware is reduced Costs are lowered, and the quality of service is raised Press In Case of Disaster There are a lot of the advantages of virtual machines that make them ideal vehicles for BC/DR Virtual machines can be transmitted over a wire (portable) Virtual machines can be programmatically powered on and off, and virtual networks can be programmatically reconfigured (automatically configured) By including the boot disk, virtual DR eliminates the need to apply OS patches in parallel at the primary and secondary sites (minimizes human error) Lower cost makes it possible to make high-quality DR protection ubiquitous, not just for the first tier of service

27 Introducing VMware Site Recovery Manager
Site Recovery Manager leverages VMware Infrastructure to deliver advanced disaster recovery management and automation Simplifies and automates disaster recovery workflows: Setup, testing, failover Turns manual recovery runbooks into automated recovery plans Provides central management of recovery plans from VirtualCenter VMware is building on the core properties of VMware Infrastructure that make it so useful for disaster recovery with a new product—VMware Site Recovery Manager Site Recovery Manager is a product that simplifies and automates disaster recovery Site Recovery Manager helps organizations to directly address the challenges of disaster recovery that were mentioned earlier: meeting RTO requirements, reducing cost, and reducing risk Site Recovery Manager is a separate product from VMware Infrastructure VMware has been working to leverage the disaster recovery features and capabilities of the VMware Infrastructure platform with a new product developed specifically for disaster recovery. This new product will simplify and automate the key elements of disaster recovery: setting up disaster recovery plans, testing those plans, executing failover when a datacenter disaster occurs, and failing back to the primary datacenter This new product, VMware Site Recovery Manager, will make it possible for customers to provide faster, more reliable, and more affordable disaster recovery protection than previously possible. Although not a part of VMware Infrastructure, Site Recovery Manager works closely with VMware Infrastructure to manage and automate disaster recovery for virtual environments Works with VMware Infrastructure to make disaster recovery rapid, reliable, manageable, affordable

28 Site Recovery Manager at a Glance
Site A Site B X Protected Site Recovery Site Protected Site Recovery Site Supports bi-directional site protection VirtualCenter Site Recovery Manager VirtualCenter Site Recovery Manager Protected VMs offline powered on Protected VMs online in Protected Site become unavailable NOTE: This slide has animation to simulate at a high level what SRM does High-level view of SRM SRM protects the VMs you select at the protected site, SRM starts up protected VMs at time of test or disaster in the recovery site Click 1: SRM Protects VMs on ESX Prod1, shadow VMs created in the Recovery Site on ESX DR1 Click 2: SRM Protects VMs on ESX Prod2, shadow VMs created in the Recovery Site on ESX DR2 Click 3 : Disaster occurs Click 4 : Business declares disaster. Initiate Recovery via SRM .Press the big red button and your protected VMs are restarted at the secondary site Array Replication Datastore Groups Datastore Groups

29 Server Side Components *
Site 1 Site 2 VC Server 2 VC Server 1 VCMS 2 DB VCMS 1 DB SRM Server 2 SRM Server 1 SRM 2 DB SRM 1 DB Storage Replication Adapter Storage Replication Adapter Array 2 Array 1 Block Replication SW NOTE: This slide has animation SRM is designed as a plug-in to VirtualCenter so that DR tasks can be executed inside the same management tool as other VM administration tasks such as creation, migration, deletion, etc… However, SRM is not “in” VC. It is a separate server process with its own separate database. The server processes for SRM and VC can be run on the same or different servers and the databases for VC and SRM can reside on the same or different database servers The most interesting piece of the install is the storage replication adapters. As you know, SRM does not actually do the replication for DR, only the setup, test, and recovery workflows. SRM relies on block based replication (fiber or iSCSI) from our storage partners for replication. The storage replication adapters tie together the SRM product and the replication products. These adapters are developed, qualified, and supported by the storage partner for the optimal reliability and the best customer experience. They sit on the SRM server and once installed are invisible for the duration of their use To Summarize 2 VC servers (one per site) 2 SRM servers (one per site) 4 databases (two per site, one for VC and one for SRM) Pre-configured array based replication Block Replication SW * Note: Conceptual drawing only. Site Recovery Manager Server may run on another system than VCMS

30 Site Recovery Manager Concept Relationship “Cheat Sheet”
Protected LUN Indivisible unit of storage that can be replicated Datastore Contains one or more LUNs (i.e. VMFS) Datastore Groups Auto-generated collection of one or more datastores. Indivisible unit or storage failover. Protection Group Collection of all VMs stored in a datastore group Recovery Recovery Plan Contains one or more protection groups There are five “moving parts” that must be understood for SRM to be used. The first four relate to the protection tasks and the last relates to the recovery tasks LUNs are block devices presented from the storage arrays. LUNs are the unit of replication for the arrays and represent the smallest possible granularity for failover. It is never possible to failover the contents of part of a LUN without failing over the entire LUN – so group VMs on LUNs accordingly VMware formats LUNs with VMFS, our filesystem and uses to store VMs. These VMFS formatted LUNs are referred to as datastores. Datastores commonly contain only a single LUN, but do have the ability to span LUNs Datastore groups are the smallest groups of datastores (and therefore LUNs) that can have their contents failed over with SRM. These groupings are calculated for you so you don’t have to worry about figuring them out. What causes LUNs and datastores to be grouped together and not distinctly managed are two things: A datastore spanning multiple LUNs causes those LUNs to be grouped together in the datastore group. Failing over part of a datastore is not possible A VM can have multiple virtual disks and those virtual disks may sit on different datastores. In that case, those datastores are forced together into a datastore group so that you don’t try to failover only part of a VM Protection groups are created with a one to one mapping to datastore groups. Protection groups are simply the group of VMs that reside on a single datastore group. This is the actual unit of VM protection and recovery Recovery plans. Once you have created protection groups, you can create recovery plans containing one or more of them. A recovery plan is simply a list of VMs from the protection groups, a startup order for those VMs, and any custom steps added before or after VM startup. This is the “virtual run book” that is executed during DR tests and actual DR failovers

31 Key Concepts And Their Relationships
Datastore Group 1 Recovery Plan 1 (Whole Site) Protection Groups: LUN 1 VMFS 1 Protection Group 1 Protection Group 1 Datastore Group 2 LUN 2 Protection Group 2 Protection Group 2 VMFS 2 Protection Group 3 LUN 3 Recovery Plan 2 (Subset) Protection Groups: Datastore Group 3 LUN 4 VMFS 3 Protection Group 3 NOTE: This slide has animation Here is a graphical representation of what was just described LUN1 is formatted with VMFS1 and has 3 VMs. It has no dependencies on anything and is in its own datastore group LUNs 2 and 3 has VMFS2 spanned across them so are in the same datastore. Since all 6 VMs on that datastore sit only on VMFS2 touching no others, VMFS2 (and therefore LUNs 2 and 3) is alone in the second datastore group LUN4 is formatted with VMFS3 and LUN5 is formatted with VMFS4. They would be in separate datastore groups were it not for the VM with a virtual disk in each of them. VMFS3 and VMFS 4 (and therefore LUNs 4 and 5) are grouped together in a third datastore group Protection groups 1, 2, and 3 are created corresponding to datastore groups 1, 2, and 3 At the recovery site for all these VMs, recovery plan 1 is created containing all three protection groups and therefore all 10 of its VMs. This recovery plan would be used if the entire site was lost Recovery plan 2 is also created with only protection group 1 and its 3 VMs. This is for some partial failure – perhaps corresponding to a server rack, an array or a business unit. It would be run to recover that particular set of systems LUN 5 VMFS 4 Protection Group 1 Protected Site Recovery Site

32 Array Integration with Site Recovery Manager
Vendor-specific scripts support: Array discovery Replicated LUN discovery Test initiation (simulated failover in an isolated environment) Failover initiation (actual failover of services to the recovery site) In cooperation with VMware and with the full support of VMware the storage vendors create the storage replication adapters for their respective storage arrays NOTE: This slide has animation SRM will leverage Storage Replication Adapters (SRAs) that have been written by the storage vendors to ensure tight integration with SRM. Note: In cooperation with VMware and with the full support of VMware the storage vendors create the SRAs for their respective storage arrays The SRAs will perform the following tasks: Array discovery Replicated LUN discovery Test & Failover initiation

33 VMware Site Recovery Manager Licensing
Protected Site Recovery Site VirtualCenter Site Recovery Manager VirtualCenter Site Recovery Manager SRM Protected VMs NOTE: This slide has animation to illustrate how SRM licensing is associated with ESX servers that host protected virtual machines High-level overview of SRM Licensing Site 1 and Site 2 will have two category of virtual machines (unprotected VMs and protected VMs) Click 1: Virtual Machines that are not protected by SRM (unprotected VMs) Click 2: Virtual Machines that will be protected by SRM (protected VMs) that can be restarted in Site 2 (recovery site) Click 3: SRM licensed per CPU socket on the ESX server that hosts the protected virtual machines SRM License file will be applied to the licensing server in the protected site. Note: If SRM is going to be used to apply cross site protection (Site 1 to Site 2 and Site 2 to Site1), SRM licenses will be required for the ESX servers in Site 2 that are hosting the designated protected virtual machines that could be failed over to Site1 at time of disaster or scheduled BC/DR test. SRM can be purchased as individual component (per CPU socket on the ESX server that hosts the protected VMs) or as part of a management & automation bundle comprised of SRM plus the IT Service Delivery bundle (Lifecycle Manager, Stage Manager and Lab Manager). SRM licensed per CPU socket on the ESX server that hosts the protected virtual machines in the Protected Site VMs not protected by Site Recovery Manager

34 Safety Tip: DNS Validation – The Rule of ‘Four’
Validate DNS is working as expected by performing the following DNS lookups for the VC,SRM and ESX servers Short name Long name Reverse Forward NOTE: This slide has animation It is highly recommended to validate DNS is working as expected and that DNS lookups in the protected and recovery site return the correct results DNS should be validated from: the VC server the SRM server each of the ESX server NOTE: Complete the DNS checks from the protected and the recovery sites

35 Site Recovery Manager 1.0 Prerequisites
ESX 3.0.2, ESX 3.5 VirtualCenter (VC) server version 2.5 installed at the protected site and at the recovery site Site Recovery Manager server installed at the protected and at the recovery site Site Recovery Manager plug-in installed on the VMware Infrastructure Clients that will access the protected and recovery site Network configuration that allows TCP connectivity between VC servers and SRM servers An Oracle or SQL Server database that uses ODBC for connectivity in the protected site and in the recovery site A Site Recovery Manager license file installed on the VC license server at the protected site and at the recovery site Pre-configured array-based replication between the protected site and the recovery site NOTE: This slide has animation SRM prerequisites (make mention of the below) Point out the need for a separate VC servers in the protected and recovery site, with separate databases Point out the need for a separate SRM servers in the protected and recovery site, with separate databases Pre-configured array based replication between the protected and recovery site 2 VC servers (one per site) 2 SRM servers (one per site) 4 databases (two per site, one for VC and one for SRM) Supported Databases for SRM 1.0 1. SQL Server 2005 Enterprise/Standard/Express - SQL Native Client. (Note: If SQL Server is installed locally, you may need to disable "Shared Memory" network setting on DB server.) 2. Oracle 9i - Oracle driver version x.x (Note: manually disable Bulk Insert in config file) 3. Oracle 10gR1, 10gR2 - Oracle driver version x.x

36 Site Recovery Manager Installation Workflow
At the protected site the following activities are completed: Installation of the SRM server Installation of the SRM Plugin into the VI Client Installation of the Storage Replication Adapter (SRA) At the recovery site the following activities are completed: Installation of the SRM Plugin into the VI Client * It is important to complete the workflows in the order detailed in this presentation NOTE: This slide has animation SRM installation workflows involves the installation of the SRM server which can be completed at the protected site and then recovery site, or vice versa Once the Installation Workflow has been completed, it is very important to complete the remaining SRM configuration workflows in the order that is detailed in the presentation * Note: Optional step, only required if a different instance of the VI Client is used to access the recovery site

37 Protected and Recovery Site Datacenters
PROTECTED SITE NOTE: This slide has animation. Before moving into the SRM Configuration workflows lets review the two datacenters that are depicted to help frame the rest of the presentation Production Datacenter we wish to protect – vim22dc (Protected Site) and VMs (app_vm1 to app_vm12) BC/DR Datacenter we will failover to – vim23dc (Recovery Site) RECOVERY SITE

38 Site Recovery Manager User Interface
SRM UI Access Local and Paired Site Protection Setup NOTE: This slide has animation SRM is accessed via the VI Client. An SRM Plugin is installed onto your VI Client resulting in the Site Recovery Icon highlighted in the slide With the exception of the Recovery Plans all SRM setup workflow will be completed from the VI Client that is connected into the protected site (Connection, Array Managers, Inventory Preferences and Protection Groups) The Recovery Plan for the VMs in the protected site is created from the VI client that is connected into the recovery site Recovery Setup

39 Setup Workflow – Protection Site
At the protection site the following setup activities are completed: The user pairs the SRM servers at the protected and recovery sites Security certificates are established between the SRM servers and the VC servers NOTE: This slide has animation Step 1: Pairing of the recovery site (vim23) to the protected site (vim22) which involves Connecting the VC server in the protected site to the VC server in the recovery sites Certificate validation between the VC servers in the protected and recovery sites Connecting the SRM server in the protected site to the SRM server in the recovery sites Certificate validation between the SRM servers in the protected and recovery sites Reciprocity is established PKCS12, Personal Information Exchange Syntax Standard, certificates can be used for things such as signing and file signing. They are different from other certificates in that rather than being only the public or private certificate, they are a combination of both plus the root certificate. This means the person they are made for only has to worry with one file. Certificates that are not properly signed will result in the Yellow Warnings Signs. Reciprocity will still be established allowing you to continue to the next step in the workflow.

40 Setup Workflow – Protection Site (continued)
Array Managers Configuration Select the correct Manager Type from the Manager type drop down box Storage Partner Participation VMware provides the SRA specification Storage Partners create the SRA Storage Partners test the SRA VMware review the SRA test results SRA support with SRM granted if all test are passed Step 2: After the pairing of the site is completed via the SRM Connection wizard the next step is to configure the array managers During the installation workflow you installed an SRA for the Array you will be using for the replication of the datastores (datastore groups) between Site1 and Site 2 During the Add Array Manager configuration workflow, you will be presented with a window similar to this, you need to select the correct manager type to enable SRM to integrate with the SAN that is replicating the datastores (datastore groups) between Site1 and Site 2

41 Setup Workflow – Protection Site (continued)
SRM identifies available arrays in the Protection and Recovery Side and the replicated datastores and determines the datastore groups Protection Side Array Discovery Recovery Side Array Discovery Step 2 continued: SRM will identify which LUNS are being replicated, and present you with the list The Array Manager wizard involves the follow steps: Protection Site array setup, pairing the array in the protected site to the array in the recovery site Recovery Site array setup Review the mirrored LUNs Replicated Datastores and Datastore Groups

42 Setup Workflow – Protection Site (continued)
Using the Inventory Preferences Mapper, the user maps resources in the protected site to their counterparts in the recovery site. Step 3: After the Array Mangers setup is completed via the SRM Array Managers wizard the next step is to configure the Inventory Preferences via the SRM Inventory Mapper wizard Using the Inventory Mapper wizard, the protected VMs now need to be mapped to the Networks Compute Resources Virtual Machine Folders that are available at the recovery site Note: These are global preferences that will be applied to all the protected VMs when they are restarted at the recovery site. In addition to the global preferences individual per VMs customization can also be applied to the protected VMs, for example network configuration information (IP, Mask, Gateway, DNS and WINS servers) to allow the protected VMs to start up correctly on the network at the recovery site

43 Setup Workflow – Protection Site (continued)
A protection group is a group of VMs that will be failed over together to the recovery site Working through the Protection Group wizard you will need to select a temporary location for placeholder VM configuration files for the protected VMs at the recovery site. Step 4: After the Inventory Preferences setup is completed the next step is to configure Protection Groups via the SRM Inventory Mapper wizard During the creation of the Protection Groups, SRM requires a location to store some temporary VirtualCenter inventory files for the protected VMs. SRM will present the available datastores at the recovery site that could be selected for the storing of these temporary files. It is preferable and suggested that you select a non replicated datastore for these temporary files at the recovery site

44 Setup Workflow – Protection Site (continued)
Working through the Protection Group wizard a user selects which VMs need to be protected and assigns them to a protection group The creation of a protection group results in VC inventory updates in the recovery site NOTE: This slide has animation Step 4 continued: After the Inventory Preferences setup is completed the next step is to configure Protection Groups via the SRM Inventory Mapper wizard A Protection group has a 1:1 Mapping to a DataStore Group in SRM A Protection Group contains the virtual machines you wish to protect in Site 1 (protected site) and allow for them to failed over to Site 2 (recovery site) This screen provides a summary of protected virtual machines (app_vm1 to app_vm12) and also shows which folders and resource RPs they will be mapped to in the recovery site Click 1: Show the shadow VM Meta data being written to the temporary storage location that was selected when working through the protection group wizard Click 2: Show the automatic update of the VC Inventory in the Recovery Site as a result of the protection group being created

45 Setup Workflow – Recovery Site
At the recovery site the following setup activity is completed: The user creates a recovery plan which is associated to a single or multiple protection groups Step 5: Working through the Recovery Plan wizard the user completes the setup of a recovery plan that is associated with a single protection group or multiple protection groups Recovery Plan is a preprogrammed BC/DR run book that will ensure your tests and failovers are executed in a repeatable and reliable manner

46 Site Recovery Manager Recovery Plan
VM Shutdown High Priority VM Shutdown Prepare Storage High Priority VM Recovery NOTE: This slide has animation Step 5: Working through the Recovery Plan wizard the user completes the setup of a recovery plan that is associated with a single protection group or multiple protection groups SRM recovery plan called ‘Recovery Plan 2 – Protection Group 2’ required to complete a partial site failover for the local data center vim22dc which is protected by SRM The protected VMs that will be failed over are app_vm7 through to app_vm12 from Protection Group 2 which is associated to the datastore group shared-san-2 Low and Normal VM shut down High Priority VM shutdown Datastore group preparation at the Recovery Site Recovery of VMs Normal Priority VM Recovery

47 Site Recovery Manager Recovery Plan (continued)
Low Priority VM Recovery Post Test Cleanup Storage Reset Site Recovery Manager Recovery Plan Benefits: Turn manual BC/DR run books into an automated process Specify the steps of the recovery process in VirtualCenter Provide a way to test your BC/DR plan in an isolated environment at the recovery site without impacting the protected VMs in the protected site NOTE: This slide has animation SRM recovery plan called ‘Recovery Plan 2 – Protection Group 2’ required to complete a partial site failover for the local data center vim22dc which is protected by SRM. The protected VMs that will be failed over are app_vm7 through to app_vm12 from Protection Group 2 which is associated to the datastore group shared-san-2

48 Testing a Recovery Plan
SRM enables you to ‘Test’ a recovery plan by simulating a failover with zero downtime to the protected VMs in the protected site SRM enables you to ‘Test’ a recovery plan by simulating a failover of virtual machines from the protected site to the recovery site. The benefit of using SRM to run a failover simulation against a recovery plan is that it allows you to confirm that the recovery plan has been setup correctly for the protected VMs. You will be able to confirm that the protected VMs startup in the correct order, taking into account the various application service dependencies for the protected VMs in your environment It is worth pointing out that when you select the option to ‘Test’ a recovery plan via SRM, the simulated failover is executed in an isolated environment that includes network and storage infrastructure at the recovery site that is isolated from the protected site (production environment) which ensures the protected VMs at the protected site are not subject to any kind of service interruption during the testing of the recovery plan SRM will also create a test report that can be used to demonstrate your level of preparedness to the business or individual business units whose services are being protected by SRM as well as to the auditors and compliance officers if required The simulated failover completes by resetting the environment to be ready for the next event which could be another simulated failover, or an actual failover for a scheduled BC/DR test or in response to an event which resulted in the business declaring a disaster

49 Testing a Recovery Plan (continued)
Recovery Only Status Success Errors Success Waiting for Input NOTE: This slide has animation Testing of a SRM Recovery Plan can be completed without impacting the protected VMs (app_vm7 to app_vm12) at the protected site SRM recovery plan called ‘Recovery Plan 2 – Protection Group 2’ required to complete a partial site failover for the local data center vim22dc which is protected by SRM The protected VMs that will be failed over are app_vm7 through to app_vm12 from Protection Group 2 which is associated to the datastore group shared-san-2 While the simulated failover test is running, the status of each step that makes up the recovery plan can be monitored by going to Recovery Steps tab in the VI Client which will inform you what steps are currently Running as well as what steps were completed with a Success status. It is worth pointing out that there are some steps in a recovery plan that will only be executed during a simulated test, these steps are identified by ‘Test Only’ under the Mode column, there are also some steps that will only be executed during an actual failover, these steps are identified by ‘Recovery only’ under the Mode column Once the simulated failover test is completes a report of the test run can be viewed from the History Tab. The report can be viewed by clicking on the ‘view’ link. The report contains a list of all the steps in the recovery plan along with a status of ‘success or error’ and the duration of each step in the recovery plan Test Only

50 Executing an Actual Failover
WARNING - Executing an actual failover will permanently alter virtual machines and infrastructure of both the protected and recovery sites NOTE: This slide has animation Click 1: Will take you to a storage configuration view after a SRM Recovery. SRM enables you to ‘Run’ a recovery plan which will result in the actual failover of virtual machines from the protected site The failover process via SRM is rapid, repeatable, reliable, manageable and auditable There are two ways to initiate the actual failover, you can either click on the ‘Run’ button or click on the ‘Execute Recovery Plan’ link under the Commands section If there is still connectivity back to the protected site at the time the disaster is declared by the business, SRM will first initiate the power down of the protected VMs at the protected site

51 Executing an Actual Failover (continued)
WARNING - Executing an actual failover will permanently alter virtual machines and infrastructure of both the protected and recovery sites The Run Recovery Plan dialog box warns you that you are about to run the a recovery plan which will result in changes to the protected virtual machines and the infrastructure of both the protected and recovery site datacenters. Click the radio button to confirm you understand the implications of running your recovery plan and then click on the Run Recovery Plan button to start the failover of protected VMs from the protected site to the recovery site The Run Recovery Plan dialog box also provides a summary of the Recovery Plan Information, that includes the Recovery Plan that is going to be run, along with the names of the protected and recovery sites, the number of protected VMs that will be failed over as well as a connectivity status from the recovery site back to the protected site While the failover is being executed, the status of each step that makes up the recovery plan can be monitored by going to Recovery Steps tab of the recovery site’s VI Client which will inform you what steps are currently Running as well as what steps were completed with a Success status WARNING - Failback to the protected site is a not an automated process in SRM 1.0

52 SRM performs a Datastore re-signature
SRM will automatically perform a re-signature on the Datastores in the Recovery Site that were replicated from the SRM Protected Site LVM.EnableResignature=1 With a re-signature - Datastore names will change to snapxxxx_datastorename, for example snap shared-san-1 snap shared-san-2 SRM VMware Topology Map after executing an actual failover from the protected site to the recovery site. Once the all the Recovery Plan steps have been executed as part of the actual failover, using the VI3 topology maps we can see the protected VMs (app_vm7 to app_vm12) are associated with an ESX host in the recovery site vim23 and serviced off a recovery site datastore. WARNING - The re-signature of the target datastore has implications during a failback (resync) of data back to the SRM Protected Site

53 Failback Options with Site Recovery Manager 1.0
SRM 1.0 does not provide a push-button automated failback process Failback Options Without SRM (no Recovery Plan, no Testing capabilities, no audit trail) Unregister the protected virtual machines in the Protected Site VC Work with your storage team, reverse data replication VM re-inventory in Protected Site VC, restart and re-ip (manual or scripted) With SRM (Recovery Plan, Test before Recovery, built-in audit trail) Delete the protection groups in the Protected Site VC Leverage SRM, complete SRM workflows in the reverse direction from Recovery Site back to the Protected Site Repeat the above steps from the Protected Site back to the Recovery Site to complete the re-protection of the virtual machines in the Protected Site NOTE: This slide has animation SRM 1.0 does not support automated failback via the SRM UI. Should there be a need to failback data to the designated protected site (original or new) after a an actual DR event or scheduled BC/DR test there are two approaches. Manual With SRM Failback with SRM Step Site Failback task 1 Site B Protected VMs recovered to Site B are no longer being used and can be powered down 2 Site B Power down the Protected VMs in Site B 3 Site B Create a list of all the Protected VMs that were recovered to Site B 4 Site B Perform a cleanup of the directory in Site B that contained the VM configuration files created during protection group creation in Site A 5 Site A Connect to the VC instance in Site A and delete the PG 1 6 Site A Connect to the VC instance in Site A and perform a remove from inventory operation on all the protected VMs in Site A that were recovered to Site B 7 Storage Work Work with your Storage team to complete a storage configuration change ‘personality swap’ whereby the Source LUN is now associated with Site B and the Target LUN is associated with Site A. Refer to Figure 6.5. 8 Site B Complete the Array Manager configuration wizard in Site B which now has the Source LUN configured in Site B and the Target LUN configured in Site A 9 Site B Configure the Inventory Preferences in Site B, these inventory preferences will be assigned to the protected VMs when they are restarted in Site A after the failback 10 Site B Connect to the VC instance in Site B and configure PG 2 11 Site A Connect to the VC instance in Site A and configure RP 2 in Site A 12 Site A Using SRM complete the Failback of the original protected VMs back to Site A. This is accomplished by performing a Recovery against RP 2. Figure depicts the storage configuration after the Recovery completes 13 Site A Shutdown all of the protected VMs in Site A that were failed back from Site B during the SRM Recovery operation performed in Step 12 14 Site A Perform a cleanup of the directory in Site A that contained the VM configuration files created during protection group creation in Site B 15 Connect to the VC instance in Site B and delete PG 2 that was created in Site B in step 10 16 Site B Connect to the VC instance in Site B and perform a remove from inventory operation on all the protected VMs in Site B that were recovered to Site A 17 Storage Work Work with your Storage team to complete a second storage configuration change ‘personality swap’ whereby the Source LUN is now re-associated with Site A, the Target LUN is re-associated with Site B along with the Clone LUN as depicted in Figure 6.11 18 Site A Create PG 3 in Site A for the protected VMs 19 Site B Re-associate PG 3 from step 19 in Site A with RP 1 in Site B 20 Site B Complete a final ‘Test’ a simulated failover against RP 1 to ensure that Site A is protected and ready for any event that may necessitate a Recovery via SRM to Site B should a disaster be declared

54 Default Roles and Privileges in Site Recovery Manager
NOTE: This slide has animation To facilitate the application of specific sets of privileges which will enable you to perform a coherent set of operations, roles specific to SRM will be defined on the VC server during installation. These roles are described here. There are two sets roles. The first set contains the roles required for the primary site user to administer protection. The second set contains the roles required for the secondary site user to administer recovery. Note that the second set of roles also includes the privileges required to perform the necessary actions on the secondary site when the protection is administered from the primary site. This means that when the primary site user is required to login to the remote (secondary) site in order to complete protection configuration, she can use the account privileged to administer recovery there. The following is the list of roles and inventory objects where the roles need to be assigned: Protection Virtual Machine Administrator: This role should be assigned on the protected Virtual Machine object in the VC inventory. It grants the associated user the ability to setup and modify the protection characteristics of the protected virtual machine. Protection SRM Administrator: This role should be assigned on the Service Instance object in the primary SRM inventory. It grants the associated user the ability to pair two sites, configure inventory mappings, and SAN arrays. Protection Groups Administrator: This role should be assigned on the Primary Configuration/Protection Service object in the SRM inventory. It grants the associated user the ability to create and modify protection profiles. Recovery Inventory Administrator: This role should be assigned on the root of the VC inventory. It grants the associated user the ability to view customization specifications existing on the secondary site. Recovery Datacenter Administrator: This role should be assigned on the Datacenter object in the VC inventory where the VMs will be recovered. It grants the associated user the ability to view available datastores and perform recovery (shadow) VM customizations. Recovery Host Administrator: This role should be assigned on the Host or DRS cluster object in the VC inventory where the VM will be recovered. It grants the associated user the ability to configure VM components during recovery. Recovery Virtual Machine Administrator: This role should be assigned on the Folder and Resource Pool objects in the VC inventory where the recovery (shadow) VMs are to be placed. It grants the associated user the ability to create and add shadow VMs to the resource pool and the folder as well as the ability to reconfigure and customize the shadow VMs at runtime and during the process of recovery. Recovery SRM Administrator: This role should be assigned on the Service Instance object in the secondary SRM inventory. It grants the associated user the ability to configure SAN arrays and create protection profiles. Recovery Plans Administrator: This role should be assigned on the Secondary Configuration/Recovery Service object in the SRM inventory. It grants the associated user the ability to reconfigure protection and shadow VMs and setup and run recovery. Note that VC already defines a Read-Only system role which can be used to grant users the ability to view the Disaster Recovery service. In addition, the Administrator role can be used to grant user complete control over both the protection and recovery components.

55 Alarms and Site Status Monitoring
SRM will support the following alarm notification actions: Send to specified address Send SNMP trap to VC trap receivers Execute specified command on VC host We recommend you complete setup of alarm notifications for: Remote Site Down Remote Site Ping Failed Replication Group Removed Recovery Plan Destroyed License Server Unreachable SRM will support the configuration of event-triggered alarms so that you can associate a notification action with any given SRM Alarm Event. These alarms are configured via the SRM UI. To get familiar with SRM Alarms and how they work we recommend you enable the 5 listed here. Remote site failure is reflected in the SRM Alarm Events and will not automatically trigger a recovery. This must be initiated manually.

56 Site Recovery Manager Server Monitoring
SRM will raise VC events for the following conditions: Disk Space Low CPU use exceeded limit Memory low Remote Site not responding Remote Site heartbeat failed Recovery Plan Test started, ended, succeeded, failed, or cancelled Virtual Machine Recovery started, ended, succeeded, failed, or reports a warning Each SRM server monitors the CPU utilization, disk space, and memory consumption of the guest on which it is running, and also maintains a heartbeat with its peer SRM server. VC events are sent if any of these measures falls outside of configured bounds.

57 Site Recovery Manager Core Benefits
Expand disaster recovery protection Now any workload in a VM can be protected with minimal incremental effort and cost Reduce time to recovery As soon as disaster is declared, a single button kicks off recovery sequence for hundreds of VMs Increase reliability of recovery Replication of system state ensures a VM has all it needs to startup Hardware independence eliminates failures due to different hardware Easier testing based off of actual failover sequence allows more frequent and more realistic tests NOTE: This slide has animation Site Recovery Manager makes it possible for organizations to expand the scope of better disaster recovery protection to more systems. It reduces time to recovery by automating recovery process. It increases reliability of recovery by eliminating several causes of failure encountered by traditional recovery and enabling easier, more frequent testing. Using Site Recovery Manager, you will be able to significantly improve your disaster recovery solution. For one, it makes it easy to expand the number of workloads being protected. By making optimal use of the recovery site resources, we allow a lower barrier to entry than with physical disaster recovery. By automating the difficult and/or manual parts of DR planning, failover, and test (e.g. mapping VMs to storage, booting in the right sequence, taking care of IP changes, etc.), Site Recovery Manager makes the incremental cost of protecting a VM very low from an operational perspective. The only real costs are the disk space at the destination site and enough bandwidth to handle the data change rate of that VM. Site Recovery Manager also significantly reduces time to recovery through automation of the recovery process. And Site Recovery Manager makes the recovery plan far more reliable. It makes hardware dependencies irrelevant by leveraging the hardware independence provided by virtualization. It helps you ensure that the right storage is replicated and only the right storage, including the VM’s system state, which us always completely up to date and patched. It takes care of networking changes you need when you recover to get everything to work properly. And most importantly, it makes it easy for you to do frequent non-disruptive tests to ensure that the recovery plan is correct and that your staff are practiced in executing it successfully.

58 Summary Site Recovery Manager Leverages VMware Infrastructure to Make Disaster Recovery Rapid Automate disaster recovery process Eliminate complexities of traditional recovery Reliable Ensure proper execution of recovery plan Enable easier, more frequent tests Manageable Centrally manage recovery plans Make plans dynamic to match environment Affordable Utilize recovery site infrastructure Reduce management costs NOTE: This slide has animation In short, Site Recovery Manager is designed to attack the key challenges of traditional disaster recovery—ensuring rapid, reliable, manageable, and affordable disaster recovery. Site Recovery Manager is designed to leverage VMware Infrastructure to address the key challenges we hear customers talking about regarding disaster recovery. Rapid recovery through automating the recovery process and eliminating complexities like hardware dependencies Reliable recovery by taking out failures due to human error or outdated run books and by enabling easier and more frequent testing Manageable recovery by providing a central console for managing recovery plans in the same place as you manage your infrastructure Affordable recovery by leveraging the cost benefits of VMware Infrastructure, making it easy to utilize recovery site hardware for other workloads without impacting your recovery time, and by reducing the operational costs of training and of continued management of your disaster recovery plans

59 Backup Slides Backup slides

60 Protected Site Topology Map
To frame the technical SRM presentation that will follow we will use the two datacenters depicted in slide as a reference. Production Datacenter we wish to protect – vim22dc (Protected Site) and VMs (app_vm1 to app_vm12) BC/DR Datacenter we will failover to – vim23dc (Recovery Site)

61 Setup Workflow – Recovery Site VC Updates
The creation of the protection group results in VC Inventory updates in the recovery site. Protected VMs app_vm1 to app_vm12 are created in the VC inventory in the recovery site with the creation of their respective protection groups in the protected site SRM Recovery Site VC Inventory view after the creation of the Protection Groups – Protection Group 1 and Protection Group 2.

62 Questions? Questions? Q&A Session if time permits.


Download ppt "Disaster Recovery with VMware Infrastructure"

Similar presentations


Ads by Google