VMware Site Recovery Manager: Technical Overview

VMware Site Recovery Manager: Technical Overview
February 2008 VMware

Agenda Introduction and Key Concepts
Site Recovery Manager 1.0 Prerequisites and SAN Integration Site Recovery Manager Workflows Site Recovery Manager Roles and Privileges Alarms and Site Status Monitoring Summary Agenda for the presentation today. 1. DR and SRM Introduction and Concepts 2. SRM 1.0 Prerequisites and SAN Integration 3. SRM Workflows (Protected and Recovery Site) 4. SRM Roles and Privileges 5. SRM Alarms and Site Status Monitoring 6. SRM Core benefits and Summary

What is a Disaster? Complete loss of a data center for an extended period of time Declaration of a disaster usually requires consensus from multiple parts of the organization (at the C*O level) What is not a disaster? Failure of an individual host A temporary service interruption This is what we mean when we talk about disasters. There is of course a gray area—does failure of a storage array constitute a disaster? How long is “an extended period?” Can disaster recovery tools assist with planned outages?

The Current State of Traditional Disaster Recovery
Tier RPO RTO Cost I Immediate $$$ II 24+ hrs. 48+ hrs. $$ III 7+ days 5+ days $ DR services tiered according to business needs Physical DR is challenging Maintain identical hardware at both locations Apply upgrades and patches in parallel Little automation Error-prone and difficult to test In our discussions with customers like you, we found that disaster protection for services tends to be tiered. In the first tier are services for which no downtime at all can be tolerated; those services tend to be deployed from the start in active-active configurations. For those sites that maintain identical idle hardware at the secondary site, just keeping up with OS patches can be a full-time job. For the remaining services, however, DR plans are in the three-ring binder (“read this in case of disaster”): To recover this server, confiscate that hardware, re-install the OS, recover from tape. Of course, these steps are very manual, and tend to be very difficult to test.

Advantages of Virtual Disaster Recovery
Virtual machines are portable Virtual hardware can be automatically configured Test and failover can be automated (minimizes human error) The need for idle hardware is reduced Costs are lowered, and the quality of service is raised Press In Case of Disaster There are a lot of the advantages of virtual machines that make them ideal vehicles for BC/DR Virtual machines can be transmitted over a wire (portable) Virtual machines can be programmatically powered on and off, and virtual networks can be programmatically reconfigured (automatically configured) By including the boot disk, virtual DR eliminates the need to apply OS patches in parallel at the primary and secondary sites (minimizes human error) Lower cost makes it possible to make high-quality DR protection ubiquitous, not just for the first tier of service

Introducing VMware Site Recovery Manager
Site Recovery Manager leverages VMware Infrastructure to deliver advanced disaster recovery management and automation Simplifies and automates disaster recovery workflows: Setup, testing, failover Turns manual recovery runbooks into automated recovery plans Provides central management of recovery plans from VirtualCenter VMware is building on the core properties of VMware Infrastructure that make it so useful for disaster recovery with a new product—VMware Site Recovery Manager Site Recovery Manager is a product that simplifies and automates disaster recovery Site Recovery Manager helps organizations to directly address the challenges of disaster recovery that were mentioned earlier: meeting RTO requirements, reducing cost, and reducing risk Site Recovery Manager is a separate product from VMware Infrastructure VMware has been working to leverage the disaster recovery features and capabilities of the VMware Infrastructure platform with a new product developed specifically for disaster recovery. This new product will simplify and automate the key elements of disaster recovery: setting up disaster recovery plans, testing those plans, executing failover when a datacenter disaster occurs, and failing back to the primary datacenter This new product, VMware Site Recovery Manager, will make it possible for customers to provide faster, more reliable, and more affordable disaster recovery protection than previously possible. Although not a part of VMware Infrastructure, Site Recovery Manager works closely with VMware Infrastructure to manage and automate disaster recovery for virtual environments Works with VMware Infrastructure to make disaster recovery rapid, reliable, manageable, affordable

Site Recovery Manager at a Glance
X Protected Site Recovery Site VirtualCenter Site Recovery Manager VirtualCenter Site Recovery Manager NOTE: This slide has animation to simulate at a high level what SRM does High-level view of SRM SRM protects the VMs you select at the protected site, SRM starts up protected VMs at time of test or disaster in the recovery site Click 1: SRM Protects VMs, shadow VMs created in the Recovery Site Click2 : Disaster occurs Click3 : Press the big red button and your protected VMs are restarted at the secondary site Array Replication Datastore Groups Datastore Groups

Server Side Components *
Site 1 Site 2 VC Server 1 VC Server 2 VCMS 1 DB VCMS 2 DB SRM Server 1 SRM Server 2 SRM 1 DB SRM 2 DB Storage Replication Adapter Storage Replication Adapter Array 1 Array 2 NOTE: This slide has animation SRM is designed as a plug-in to VirtualCenter so that DR tasks can be executed inside the same management tool as other VM administration tasks such as creation, migration, deletion, etc… However, SRM is not “in” VC. It is a separate server process with its own separate database. The server processes for SRM and VC can be run on the same or different servers and the databases for VC and SRM can reside on the same or different database servers The most interesting piece of the install is the storage replication adapters. As you know, SRM does not actually do the replication for DR, only the setup, test, and recovery workflows. SRM relies on block based replication (fiber or iSCSI) from our storage partners for replication. The storage replication adapters tie together the SRM product and the replication products. These adapters are developed, qualified, and supported by the storage partner for the optimal reliability and the best customer experience. They sit on the SRM server and once installed are invisible for the duration of their use To Summarize 2 VC servers (one per site) 2 SRM servers (one per site) 4 databases (two per site, one for VC and one for SRM) Pre-configured array based replication Block Replication SW Block Replication SW * Note: Conceptual drawing only. SRM Server may run on another system than VCMS

Site Recovery Manager Concept Relationship “Cheat Sheet”
Protected LUN Indivisible unit of storage that can be replicated Datastore Contains one or more LUNs (i.e. VMFS) Datastore Groups Auto-generated collection of one or more datastores. Indivisible unit or storage failover. Protection Group Collection of all VMs stored in a datastore group Recovery Recovery Plan Contains one or more protection groups There are five “moving parts” that must be understood for SRM to be used. The first four relate to the protection tasks and the last relates to the recovery tasks LUNs are block devices presented from the storage arrays. LUNs are the unit of replication for the arrays and represent the smallest possible granularity for failover. It is never possible to failover the contents of part of a LUN without failing over the entire LUN – so group VMs on LUNs accordingly VMware formats LUNs with VMFS, our filesystem and uses to store VMs. These VMFS formatted LUNs are referred to as datastores. Datastores commonly contain only a single LUN, but do have the ability to span LUNs Datastore groups are the smallest groups of datastores (and therefore LUNs) that can have their contents failed over with SRM. These groupings are calculated for you so you don’t have to worry about figuring them out. What causes LUNs and datastores to be grouped together and not distinctly managed are two things: A datastore spanning multiple LUNs causes those LUNs to be grouped together in the datastore group. Failing over part of a datastore is not possible A VM can have multiple virtual disks and those virtual disks may sit on different datastores. In that case, those datastores are forced together into a datastore group so that you don’t try to failover only part of a VM Protection groups are created with a one to one mapping to datastore groups. Protection groups are simply the group of VMs that reside on a single datastore group. This is the actual unit of VM protection and recovery Recovery plans. Once you have created protection groups, you can create recovery plans containing one or more of them. A recovery plan is simply a list of VMs from the protection groups, a startup order for those VMs, and any custom steps added before or after VM startup. This is the “virtual run book” that is executed during DR tests and actual DR failovers

Key Concepts And Their Relationships
Datastore Group 1 Recovery Plan 1 (Whole Site) Protection Groups: LUN 1 VMFS 1 Protection Group 1 Protection Group 1 Datastore Group 2 LUN 2 Protection Group 2 Protection Group 2 VMFS 2 Protection Group 3 LUN 3 Recovery Plan 2 (Subset) Protection Groups: Datastore Group 3 LUN 4 VMFS 3 Protection Group 3 NOTE: This slide has animation Here is a graphical representation of what was just described LUN1 is formatted with VMFS1 and has 3 VMs. It has no dependencies on anything and is in its own datastore group LUNs 2 and 3 has VMFS2 spanned across them so are in the same datastore. Since all 6 VMs on that datastore sit only on VMFS2 touching no others, VMFS2 (and therefore LUNs 2 and 3) is alone in the second datastore group LUN4 is formatted with VMFS3 and LUN5 is formatted with VMFS4. They would be in separate datastore groups were it not for the VM with a virtual disk in each of them. VMFS3 and VMFS 4 (and therefore LUNs 4 and 5) are grouped together in a third datastore group Protection groups 1, 2, and 3 are created corresponding to datastore groups 1, 2, and 3 At the recovery site for all these VMs, recovery plan 1 is created containing all three protection groups and therefore all 10 of its VMs. This recovery plan would be used if the entire site was lost Recovery plan 2 is also created with only protection group 1 and its 3 VMs. This is for some partial failure – perhaps corresponding to a server rack, an array or a business unit. It would be run to recover that particular set of systems LUN 5 VMFS 4 Protection Group 1 Protected Site Recovery Site

Array Integration with SRM
Vendor-specific scripts support: Array discovery Replicated LUN discovery SRM Test initiation (simulated failover in an isolated environment) SRM Failover initiation (actual failover of services to the recovery site) Array vendors will be responsible for creating the scripts for their arrays to enable the integration with Site Recovery Manager NOTE: This slide has animation SRM will leverage Storage Replication Adapters (SRAs) that have been written by the array vendors to ensure tight integration with SRM. The SRAs will perform the following tasks: Array discovery Replicated LUN discovery Test & Failover initiation

Safety Tip: DNS Validation – The Rule of ‘Four’
Validate DNS is working as expected and by performing the following DNS lookups for the VC,SRM and ESX servers Short name Long name Reverse Forward NOTE: This slide has animation It is highly recommended to validate DNS is working as expected and that DNS lookups in the protected and recovery site return the correct results DNS should be validated from: the VC server the SRM server each of the ESX server NOTE: Complete the DNS checks from the protected and the recovery sites

Site Recovery Manager 1.0 Prerequisites
ESX Server 3.0.2, ESX Server 3.5 or ESX Server 3i VirtualCenter (VC) server version 2.5 installed at the protected site and at the recovery site SRM server installed at the protected and at the recovery site SRM plug-in installed on the VI Clients that will access the protected and recovery site Network configuration that allows TCP connectivity between VC servers and SRM servers An Oracle or SQL Server database that uses ODBC for connectivity in the protected site and in the recovery site A SRM license installed on the VC license server at the protected site and at the recovery site Pre-configured array-based replication between the protected site and the recovery site NOTE: This slide has animation SRM prerequisites (make mention of the below) Point out the need for a separate VC servers in the protected and recovery site, with separate databases Point out the need for a separate SRM servers in the protected and recovery site, with separate databases Pre-configured array based replication between the protected and recovery site 2 VC servers (one per site) 2 SRM servers (one per site) 4 databases (two per site, one for VC and one for SRM)

Installation Workflow
At the protected site the following activities are completed: Installation of the SRM server Installation of the SRM Plugin into the VI Client Installation of the Storage Replication Adapter (SRA) At the recovery site the following activities are completed: Installation of the SRM Plugin into the VI Client * It is important to complete the Site Recovery Manager workflows in the order detailed in this presentation NOTE: This slide has animation SRM installation workflows involves the installation of the SRM server which can be completed at the protected site and then recovery site, or vice versa Once the Installation Workflow has been completed, it is very important to complete the remaining SRM configuration workflows in the order that is detailed in the presentation * Note: Optional step, only required if a different instance of the VI Client is used to access the recovery site

Protected and Recovery Site Datacenters
PROTECTED SITE NOTE: This slide has animation. Before moving into the SRM Configuration workflows lets review the two datacenters that are depicted to help frame the rest of the presentation Production Datacenter we wish to protect – vim22dc (Protected Site) and VMs (app_vm1 to app_vm12) BC/DR Datacenter we will failover to – vim23dc (Recovery Site) RECOVERY SITE

Site Recovery Manager User Interface
NOTE: This slide has animation SRM is accessed via the VI Client. An SRM Plugin is installed onto your VI Client resulting in the Site Recovery Icon highlighted in the slide With the exception of the Recovery Plans all SRM setup workflow will be completed from the VI Client that is connected into the protected site (Connection, Array Managers, Inventory Preferences and Protection Groups) The Recovery Plan for the VMs in the protected site is created from the VI client that is connected into the recovery site

Setup Workflow – Protection Site
At the protection site the following setup activities are completed: The user pairs the SRM servers at the protected and recovery sites Security certificates are established between the SRM servers and the VC servers NOTE: This slide has animation Step 1: Pairing of the recovery site (vim23) to the protected site (vim22) which involves Connecting the VC server in the protected site to the VC server in the recovery sites Certificate validation between the VC servers in the protected and recovery sites Connecting the SRM server in the protected site to the SRM server in the recovery sites Certificate validation between the SRM servers in the protected and recovery sites Reciprocity is established PKCS12, Personal Information Exchange Syntax Standard, certificates can be used for things such as signing and file signing. They are different from other certificates in that rather than being only the public or private certificate, they are a combination of both plus the root certificate. This means the person they are made for only has to worry with one file. Certificates that are not properly signed will result in the Yellow Warnings Signs. Reciprocity will still be established allowing you to continue to the next step in the workflow.

Array Managers Configuration Select the correct Manager Type from the Manager type drop down box Step 2: After the pairing of the site is completed via the SRM Connection wizard the next step is to configure the array managers During the installation workflow you installed an SRA for the Array you will be using for the replication of the datastores (datastore groups) between Site1 and Site 2 During the Add Array Manager configuration workflow, you will be presented with a window similar to this, you need to select the correct manager type to enable SRM to integrate with the SAN that is replicating the datastores (datastore groups) between Site1 and Site 2

SRM identifies available arrays and replicated datastores and determines the datastore groups. Step 2 continued: SRM will identify which LUNS are being replicated, and present you with the list The Array Manager wizard involves the follow steps: Protection Site array setup, pairing the array in the protected site to the array in the recovery site Recovery Site array setup Review the mirrored LUNs

Using the Inventory Preferences Mapper, the user maps resources in the protected site to their counterparts in the recovery site. Step 3: After the Array Mangers setup is completed via the SRM Array Managers wizard the next step is to configure the Inventory Preferences via the SRM Inventory Mapper wizard Using the Inventory Mapper wizard, the protected VMs now need to be mapped to the Networks Compute Resources Virtual Machine Folders that are available at the recovery site Note: These are global preferences that will be applied to all the protected VMs when they are restarted at the recovery site. In addition to the global preferences individual per VMs customization can also be applied to the protected VMs, for example network configuration information (IP, Mask, Gateway, DNS and WINS servers) to allow the protected VMs to start up correctly on the network at the recovery site

A protection group is a group of VMs that will be failed over together to the recovery site Working through the Protection Group wizard you will need to select a location for temporary VirtualCenter Inventory files for the protected VMs at the recovery site. Step 4: After the Inventory Preferences setup is completed the next step is to configure Protection Groups via the SRM Inventory Mapper wizard During the creation of the Protection Groups, SRM requires a location to store some temporary VirtualCenter inventory files for the protected VMs. SRM will present the available datastores at the recovery site that could be selected for the storing of these temporary files. It is preferable and suggested that you select a non replicated datastore for these temporary files at the recovery site

Working through the Protection Group wizard a user selects which VMs need to be protected and assigns them to a protection group The creation of a protection group results in VC inventory updates in the recovery site NOTE: This slide has animation Step 4 continued: After the Inventory Preferences setup is completed the next step is to configure Protection Groups via the SRM Inventory Mapper wizard A Protection group has a 1:1 Mapping to a DataStore Group in SRM A Protection Group contains the virtual machines you wish to protect in Site 1 (protected site) and allow for them to failed over to Site 2 (recovery site) This screen provides a summary of protected virtual machines (app_vm1 to app_vm12) and also shows which folders and resource RPs they will be mapped to in the recovery site Click 1: Show the shadow VM Meta data being written to the temporary storage location that was selected when working through the protection group wizard Click 2: Show the automatic update of the VC Inventory in the Recovery Site as a result of the protection group being created

Setup Workflow – Recovery Site
At the recovery site the following setup activity is completed: The user creates a recovery plan which is associated to a single or multiple protection groups Step 5: Working through the Recovery Plan wizard the user completes the setup of a recovery plan that is associated with a single protection group or multiple protection groups Recovery Plan is a preprogrammed BC/DR run book that will ensure your tests and failovers are executed in a repeatable and reliable manner

Site Recovery Manager Recovery Plan
VM Shutdown High Priority VM Shutdown Attach Virtual Disks High Priority VM Recovery NOTE: This slide has animation Step 5: Working through the Recovery Plan wizard the user completes the setup of a recovery plan that is associated with a single protection group or multiple protection groups SRM recovery plan called ‘Recovery Plan 2 – Protection Group 2’ required to complete a partial site failover for the local data center vim22dc which is protected by SRM The protected VMs that will be failed over are app_vm7 through to app_vm12 from Protection Group 2 which is associated to the datastore group shared-san-2 Low and Normal VM shut down High Priority VM shutdown Datastore group preparation at the Recovery Site Recovery of VMs Normal Priority VM Recovery

Site Recovery Manager Recovery Plan
Low Priority VM Recovery Post Test Cleanup Virtual Disk Reset Site Recovery Manager Recovery Plans: Turn manual BC/DR run books into an automated process Specify the steps of the recovery process in VirtualCenter Provide a way to test your BC/DR plan in an isolated environment at the recovery site without impacting the protected VMs in the protected site NOTE: This slide has animation SRM recovery plan called ‘Recovery Plan 2 – Protection Group 2’ required to complete a partial site failover for the local data center vim22dc which is protected by SRM. The protected VMs that will be failed over are app_vm7 through to app_vm12 from Protection Group 2 which is associated to the datastore group shared-san-2

Testing a Recovery Plan
‘Test’ a recovery plan by simulating a failover of protected VMs with zero downtime to the protected VMs in the protected site SRM enables you to ‘Test’ a recovery plan by simulating a failover of virtual machines from the protected site to the recovery site. The benefit of using SRM to run a failover simulation against a recovery plan is that it allows you to confirm that the recovery plan has been setup correctly for the protected VMs. You will be able to confirm that the protected VMs startup in the correct order, taking into account the various application service dependencies for the protected VMs in your environment It is worth pointing out that when you select the option to ‘Test’ a recovery plan via SRM, the simulated failover is executed in an isolated environment that includes network and storage infrastructure at the recovery site that is isolated from the protected site (production environment) which ensures the protected VMs at the protected site are not subject to any kind of service interruption during the testing of the recovery plan SRM will also create a test report that can be used to demonstrate your level of preparedness to the business or individual business units whose services are being protected by SRM as well as to the auditors and compliance officers if required The simulated failover completes by resetting the environment to be ready for the next event which could be another simulated failover, or an actual failover for a scheduled BC/DR test or in response to an event which resulted in the business declaring a disaster

Testing a Recovery Plan
NOTE: This slide has animation Testing of a SRM Recovery Plan can be completed without impacting the protected VMs (app_vm7 to app_vm12) at the protected site SRM recovery plan called ‘Recovery Plan 2 – Protection Group 2’ required to complete a partial site failover for the local data center vim22dc which is protected by SRM The protected VMs that will be failed over are app_vm7 through to app_vm12 from Protection Group 2 which is associated to the datastore group shared-san-2 While the simulated failover test is running, the status of each step that makes up the recovery plan can be monitored by going to Recovery Steps tab in the VI Client which will inform you what steps are currently Running as well as what steps were completed with a Success status. It is worth pointing out that there are some steps in a recovery plan that will only be executed during a simulated test, these steps are identified by ‘Test Only’ under the Mode column, there are also some steps that will only be executed during an actual failover, these steps are identified by ‘Recovery only’ under the Mode column Once the simulated failover test is completes a report of the test run can be viewed from the History Tab. The report can be viewed by clicking on the ‘view’ link. The report contains a list of all the steps in the recovery plan along with a status of ‘success or error’ and the duration of each step in the recovery plan

Executing Failover SRM enables you to ‘Run’ a recovery plan which will result in the actual failover of virtual machines from the protected site The failover process via SRM is rapid, repeatable, reliable, manageable and auditable There are two ways to initiate the actual failover, you can either click on the ‘Run’ button or click on the ‘Execute Recovery Plan’ link under the Commands section If there is still connectivity back to the protected site at the time the disaster is declared by the business, SRM will first initiate the power down of the protected VMs at the protected site WARNING - Executing an actual failover will permanently alter virtual machines and infrastructure of both the protected and recovery sites

Executing Failover WARNING - Executing an actual failover will permanently alter virtual machines and infrastructure of both the protected and recovery sites The Run Recovery Plan dialog box warns you that you are about to run the a recovery plan which will result in changes to the protected virtual machines and the infrastructure of both the protected and recovery site datacenters. Click the radio button to confirm you understand the implications of running your recovery plan and then click on the Run Recovery Plan button to start the failover of protected VMs from the protected site to the recovery site The Run Recovery Plan dialog box also provides a summary of the Recovery Plan Information, that includes the Recovery Plan that is going to be run, along with the names of the protected and recovery sites, the number of protected VMs that will be failed over as well as a connectivity status from the recovery site back to the protected site While the failover is being executed, the status of each step that makes up the recovery plan can be monitored by going to Recovery Steps tab of the recovery site’s VI Client which will inform you what steps are currently Running as well as what steps were completed with a Success status

Failback Options in Site Recovery Manager 1.0
Site Recovery Manager 1.0 does not provide a push-button automated failback process. Failback Options: Without SRM (no startup order, no failback history reports) Work with your storage team, reverse data replication VM re-inventory*, restart and re-ip (manual or scripted) With SRM (start up order in recovery plan with failback history) Leverage SRM, complete all SRM workflows in the reverse direction from Recovery Site back to the Protected Site Repeat the above two steps from the Protected Site back to the recovery Site. NOTE: This slide has animation SRM 1.0 does not support automated failback via the SRM UI. Should there be a need to failback data to the designated protected site (original or new) after a an actual DR event or scheduled BC/DR test there are two approaches: Note: Failback will require downtime and depending on the approach could involve multiple steps and reversal of data replication between sites. Without SRM (no startup order, no failback history reports). Power down all protected VMs that were failed over to the recovery site. Working closely with your storage team have them configure replication from the recovery site back to the protected site. Once they have confirmed all data has been replicated back to the protected site move onto the next step. VM re-inventory, restart and re-IP disparate networks in use between the protected and recovery site. Note: VM re-inventory may not be necessary if original VC is intact. With SRM (start up order in recovery plan with failback history) NOW FAILING BACK SERVICES FROM TO THE PROTECTED SITE. Work with your storage team, reverse data replication (Recovery Site back to original Protected Site) Leverage SRM, complete all SRM workflows in the reverse direction from Recovery Site back to the Protected Site. This involves: (Recovery site back to original designated protected site) Delete original recovery plan (RP) in the recovery site. Delete original protection groups (PG) in the protected site. SRM workflow reversal where the original recovery site will now become the new designated protected site and the original protected site becomes the new designated recovery site. (Pair the sites, Array Manager Configuration, Inventory Preferences, Protection Groups – Protected site and Recovery Plan – Recovery site). Initiate a SRM simulated test confirm startup sequence. Initiate an actual failover (failback) with SRM from the Recovery Site to the Protected Site. NOW RE-PROTECTING THE ORIGINAL PROTECTED SITE Work with your storage team, reverse data replication. (Protected Site back to the Recovery Site) Leverage SRM, complete all SRM workflows from Protected Site back to the Recovery Site. (Protected site back to original designated recovery site) Delete recovery plan (RP) in the protected site. Delete protection groups (PG) in the recovery site. SRM workflow setup protecting virtual machines in the Protected Site allowing them to be failed over to the Recovery Site. (Pair the sites, Array Manager Configuration, Inventory Preferences, Protection Groups – Protected site and Recovery Plan – Recovery site). Initiate a SRM simulated test confirm startup sequence, your protected virtual machines are now ready and prepared for a disaster event. * Note: VM re-inventory in VC may not be necessary in the Protected site.

Default Roles and Privileges
NOTE: This slide has animation To facilitate the application of specific sets of privileges which will enable you to perform a coherent set of operations, roles specific to SRM will be defined on the VC server during installation. These roles are described here. There are two sets roles. The first set contains the roles required for the primary site user to administer protection. The second set contains the roles required for the secondary site user to administer recovery. Note that the second set of roles also includes the privileges required to perform the necessary actions on the secondary site when the protection is administered from the primary site. This means that when the primary site user is required to login to the remote (secondary) site in order to complete protection configuration, she can use the account privileged to administer recovery there. The following is the list of roles and inventory objects where the roles need to be assigned: Protection Virtual Machine Administrator: This role should be assigned on the protected Virtual Machine object in the VC inventory. It grants the associated user the ability to setup and modify the protection characteristics of the protected virtual machine. Protection SRM Administrator: This role should be assigned on the Service Instance object in the primary SRM inventory. It grants the associated user the ability to pair two sites, configure inventory mappings, and SAN arrays. Protection Groups Administrator: This role should be assigned on the Primary Configuration/Protection Service object in the SRM inventory. It grants the associated user the ability to create and modify protection profiles. Recovery Inventory Administrator: This role should be assigned on the root of the VC inventory. It grants the associated user the ability to view customization specifications existing on the secondary site. Recovery Datacenter Administrator: This role should be assigned on the Datacenter object in the VC inventory where the VMs will be recovered. It grants the associated user the ability to view available datastores and perform recovery (shadow) VM customizations. Recovery Host Administrator: This role should be assigned on the Host or DRS cluster object in the VC inventory where the VM will be recovered. It grants the associated user the ability to configure VM components during recovery. Recovery Virtual Machine Administrator: This role should be assigned on the Folder and Resource Pool objects in the VC inventory where the recovery (shadow) VMs are to be placed. It grants the associated user the ability to create and add shadow VMs to the resource pool and the folder as well as the ability to reconfigure and customize the shadow VMs at runtime and during the process of recovery. Recovery SRM Administrator: This role should be assigned on the Service Instance object in the secondary SRM inventory. It grants the associated user the ability to configure SAN arrays and create protection profiles. Recovery Plans Administrator: This role should be assigned on the Secondary Configuration/Recovery Service object in the SRM inventory. It grants the associated user the ability to reconfigure protection and shadow VMs and setup and run recovery. Note that VC already defines a Read-Only system role which can be used to grant users the ability to view the Disaster Recovery service. In addition, the Administrator role can be used to grant user complete control over both the protection and recovery components.

Alarms and Site Status Monitoring
Site Recovery Manager will support the following alarm notification actions: Send to specified address Send SNMP trap to VC trap receivers Execute specified command on VC host We recommend you complete setup of alarm notifications for: Remote Site Down Remote Site Ping Failed Replication Group Removed Recovery Plan Destroyed License Server Unreachable SRM will support the configuration of event-triggered alarms so that you can associate a notification action with any given SRM Alarm Event. These alarms are configured via the SRM UI. To get familiar with SRM Alarms and how they work we recommend you enable the 5 listed here. Remote site failure is reflected in the SRM Alarm Events and will not automatically trigger a recovery. This must be initiated manually.

Site Recovery Manager Server Monitoring
Site Recovery Manager will raise VirtualCenter events for the following conditions: Disk Space Low CPU use exceeded limit Memory low Remote Site not responding Remote Site heartbeat failed Recovery Plan Test started, ended, succeeded, failed, or cancelled Virtual Machine Recovery started, ended, succeeded, failed, or reports a warning Each SRM server monitors the CPU utilization, disk space, and memory consumption of the guest on which it is running, and also maintains a heartbeat with its peer SRM server. VC events are sent if any of these measures falls outside of configured bounds.

Site Recovery Manager Core Benefits
Expand disaster recovery protection Now any workload in a VM can be protected with minimal incremental effort and cost Reduce time to recovery As soon as disaster is declared, a single button kicks off recovery sequence for hundreds of VMs Increase reliability of recovery Replication of system state ensures a VM has all it needs to startup Hardware independence eliminates failures due to different hardware Easier testing based off of actual failover sequence allows more frequent and more realistic tests NOTE: This slide has animation Site Recovery Manager makes it possible for organizations to expand the scope of better disaster recovery protection to more systems. It reduces time to recovery by automating recovery process. It increases reliability of recovery by eliminating several causes of failure encountered by traditional recovery and enabling easier, more frequent testing. Using Site Recovery Manager, you will be able to significantly improve your disaster recovery solution. For one, it makes it easy to expand the number of workloads being protected. By making optimal use of the recovery site resources, we allow a lower barrier to entry than with physical disaster recovery. By automating the difficult and/or manual parts of DR planning, failover, and test (e.g. mapping VMs to storage, booting in the right sequence, taking care of IP changes, etc.), Site Recovery Manager makes the incremental cost of protecting a VM very low from an operational perspective. The only real costs are the disk space at the destination site and enough bandwidth to handle the data change rate of that VM. Site Recovery Manager also significantly reduces time to recovery through automation of the recovery process. And Site Recovery Manager makes the recovery plan far more reliable. It makes hardware dependencies irrelevant by leveraging the hardware independence provided by virtualization. It helps you ensure that the right storage is replicated and only the right storage, including the VM’s system state, which us always completely up to date and patched. It takes care of networking changes you need when you recover to get everything to work properly. And most importantly, it makes it easy for you to do frequent non-disruptive tests to ensure that the recovery plan is correct and that your staff are practiced in executing it successfully.

Summary Site Recovery Manager Leverages VMware Infrastructure to Make Disaster Recovery Rapid Automate disaster recovery process Eliminate complexities of traditional recovery Reliable Ensure proper execution of recovery plan Enable easier, more frequent tests Manageable Centrally manage recovery plans Make plans dynamic to match environment Affordable Utilize recovery site infrastructure Reduce management costs NOTE: This slide has animation In short, Site Recovery Manager is designed to attack the key challenges of traditional disaster recovery—ensuring rapid, reliable, manageable, and affordable disaster recovery. Site Recovery Manager is designed to leverage VMware Infrastructure to address the key challenges we hear customers talking about regarding disaster recovery. Rapid recovery through automating the recovery process and eliminating complexities like hardware dependencies Reliable recovery by taking out failures due to human error or outdated run books and by enabling easier and more frequent testing Manageable recovery by providing a central console for managing recovery plans in the same place as you manage your infrastructure Affordable recovery by leveraging the cost benefits of VMware Infrastructure, making it easy to utilize recovery site hardware for other workloads without impacting your recovery time, and by reducing the operational costs of training and of continued management of your disaster recovery plans

Questions? Questions? Q&A Session if time permits.

VMware Site Recovery Manager: Technical Overview

Similar presentations

Presentation on theme: "VMware Site Recovery Manager: Technical Overview"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

VMware Site Recovery Manager: Technical Overview

Similar presentations

Presentation on theme: "VMware Site Recovery Manager: Technical Overview"— Presentation transcript:

Similar presentations

About project

Feedback