Presentation on theme: "VMware Site Recovery Manager: Technical Overview"— Presentation transcript:
1 VMware Site Recovery Manager: Technical Overview February 2008VMware
2 Agenda Introduction and Key Concepts Site Recovery Manager 1.0 Prerequisites and SAN IntegrationSite Recovery Manager WorkflowsSite Recovery Manager Roles and PrivilegesAlarms and Site Status MonitoringSummaryAgenda for the presentation today.1. DR and SRM Introduction and Concepts2. SRM 1.0 Prerequisites and SAN Integration3. SRM Workflows (Protected and Recovery Site)4. SRM Roles and Privileges5. SRM Alarms and Site Status Monitoring6. SRM Core benefits and Summary
3 What is a Disaster?Complete loss of a data center for an extended period of timeDeclaration of a disaster usually requires consensus from multiple parts of the organization (at the C*O level)What is not a disaster?Failure of an individual hostA temporary service interruptionThis is what we mean when we talk about disasters.There is of course a gray area—does failure of a storage array constitute a disaster? How long is “an extended period?”Can disaster recovery tools assist with planned outages?
4 The Current State of Traditional Disaster Recovery TierRPORTOCostIImmediate$$$II24+ hrs.48+ hrs.$$III7+ days5+ days$DR services tiered according to business needsPhysical DR is challengingMaintain identical hardware at both locationsApply upgrades and patches in parallelLittle automationError-prone and difficult to testIn our discussions with customers like you, we found that disaster protection for services tends to be tiered.In the first tier are services for which no downtime at all can be tolerated; those services tend to be deployed from the start in active-active configurations.For those sites that maintain identical idle hardware at the secondary site, just keeping up with OS patches can be a full-time job.For the remaining services, however, DR plans are in the three-ring binder (“read this in case of disaster”): To recover this server, confiscate that hardware, re-install the OS, recover from tape.Of course, these steps are very manual, and tend to be very difficult to test.
5 Advantages of Virtual Disaster Recovery Virtual machines are portableVirtual hardware can be automatically configuredTest and failover can be automated (minimizes human error)The need for idle hardware is reducedCosts are lowered, and the quality of service is raisedPress In Caseof DisasterThere are a lot of the advantages of virtual machines that make them ideal vehicles for BC/DRVirtual machines can be transmitted over a wire (portable)Virtual machines can be programmatically powered on and off, and virtual networks can be programmatically reconfigured (automatically configured)By including the boot disk, virtual DR eliminates the need to apply OS patches in parallel at the primary and secondary sites (minimizes human error)Lower cost makes it possible to make high-quality DR protection ubiquitous, not just for the first tier of service
6 Introducing VMware Site Recovery Manager Site Recovery Manager leverages VMware Infrastructure to deliver advanced disaster recovery management and automationSimplifies and automates disaster recovery workflows:Setup, testing, failoverTurns manual recovery runbooks into automated recovery plansProvides central management of recovery plans from VirtualCenterVMware is building on the core properties of VMware Infrastructure that make it so useful for disaster recovery with a new product—VMware Site Recovery ManagerSite Recovery Manager is a product that simplifies and automates disaster recoverySite Recovery Manager helps organizations to directly address the challenges of disaster recovery that were mentioned earlier: meeting RTO requirements, reducing cost, and reducing riskSite Recovery Manager is a separate product from VMware InfrastructureVMware has been working to leverage the disaster recovery features and capabilities of the VMware Infrastructure platform with a new product developed specifically for disaster recovery. This new product will simplify and automate the key elements of disaster recovery: setting up disaster recovery plans, testing those plans, executing failover when a datacenter disaster occurs, and failing back to the primary datacenterThis new product, VMware Site Recovery Manager, will make it possible for customers to provide faster, more reliable, and more affordable disaster recovery protection than previously possible. Although not a part of VMware Infrastructure, Site Recovery Manager works closely with VMware Infrastructure to manage and automate disaster recovery for virtual environmentsWorks with VMware Infrastructure to make disaster recovery rapid, reliable, manageable, affordable
7 Site Recovery Manager at a Glance XProtected SiteRecovery SiteVirtualCenterSite Recovery ManagerVirtualCenterSite Recovery ManagerNOTE: This slide has animation to simulate at a high level what SRM doesHigh-level view of SRMSRM protects the VMs you select at the protected site, SRM starts up protected VMs at time of test or disaster in the recovery siteClick 1: SRM Protects VMs, shadow VMs created in the Recovery SiteClick2 : Disaster occursClick3 : Press the big red button and your protected VMs are restarted at the secondary siteArray ReplicationDatastore GroupsDatastore Groups
8 Server Side Components * Site 1Site 2VC Server 1VC Server 2VCMS 1 DBVCMS 2 DBSRM Server 1SRM Server 2SRM 1 DBSRM 2 DBStorageReplicationAdapterStorageReplicationAdapterArray 1Array 2NOTE: This slide has animationSRM is designed as a plug-in to VirtualCenter so that DR tasks can be executed inside the same management tool as other VM administration tasks such as creation, migration, deletion, etc…However, SRM is not “in” VC. It is a separate server process with its own separate database. The server processes for SRM and VC can be run on the same or different servers and the databases for VC and SRM can reside on the same or different database serversThe most interesting piece of the install is the storage replication adapters. As you know, SRM does not actually do the replication for DR, only the setup, test, and recovery workflows. SRM relies on block based replication (fiber or iSCSI) from our storage partners for replication. The storage replication adapters tie together the SRM product and the replication products. These adapters are developed, qualified, and supported by the storage partner for the optimal reliability and the best customer experience. They sit on the SRM server and once installed are invisible for the duration of their useTo Summarize2 VC servers (one per site)2 SRM servers (one per site)4 databases (two per site, one for VC and one for SRM)Pre-configured array based replicationBlock Replication SWBlock Replication SW* Note: Conceptual drawing only. SRM Server may run on another system than VCMS
9 Site Recovery Manager Concept Relationship “Cheat Sheet” ProtectedLUNIndivisible unit of storage that can be replicatedDatastoreContains one or more LUNs (i.e. VMFS)Datastore GroupsAuto-generated collection of one or more datastores. Indivisible unit or storage failover.Protection GroupCollection of all VMs stored in a datastore groupRecoveryRecovery PlanContains one or more protection groupsThere are five “moving parts” that must be understood for SRM to be used. The first four relate to the protection tasks and the last relates to the recovery tasksLUNs are block devices presented from the storage arrays. LUNs are the unit of replication for the arrays and represent the smallest possible granularity for failover. It is never possible to failover the contents of part of a LUN without failing over the entire LUN – so group VMs on LUNs accordinglyVMware formats LUNs with VMFS, our filesystem and uses to store VMs. These VMFS formatted LUNs are referred to as datastores. Datastores commonly contain only a single LUN, but do have the ability to span LUNsDatastore groups are the smallest groups of datastores (and therefore LUNs) that can have their contents failed over with SRM. These groupings are calculated for you so you don’t have to worry about figuring them out. What causes LUNs and datastores to be grouped together and not distinctly managed are two things:A datastore spanning multiple LUNs causes those LUNs to be grouped together in the datastore group. Failing over part of a datastore is not possibleA VM can have multiple virtual disks and those virtual disks may sit on different datastores. In that case, those datastores are forced together into a datastore group so that you don’t try to failover only part of a VMProtection groups are created with a one to one mapping to datastore groups. Protection groups are simply the group of VMs that reside on a single datastore group. This is the actual unit of VM protection and recoveryRecovery plans. Once you have created protection groups, you can create recovery plans containing one or more of them. A recovery plan is simply a list of VMs from the protection groups, a startup order for those VMs, and any custom steps added before or after VM startup. This is the “virtual run book” that is executed during DR tests and actual DR failovers
10 Key Concepts And Their Relationships Datastore Group 1Recovery Plan 1 (Whole Site)Protection Groups:LUN 1VMFS 1Protection Group 1Protection Group 1Datastore Group 2LUN 2Protection Group 2Protection Group 2VMFS 2Protection Group 3LUN 3Recovery Plan 2 (Subset)Protection Groups:Datastore Group 3LUN 4VMFS 3Protection Group 3NOTE: This slide has animationHere is a graphical representation of what was just describedLUN1 is formatted with VMFS1 and has 3 VMs. It has no dependencies on anything and is in its own datastore groupLUNs 2 and 3 has VMFS2 spanned across them so are in the same datastore. Since all 6 VMs on that datastore sit only on VMFS2 touching no others, VMFS2 (and therefore LUNs 2 and 3) is alone in the second datastore groupLUN4 is formatted with VMFS3 and LUN5 is formatted with VMFS4. They would be in separate datastore groups were it not for the VM with a virtual disk in each of them. VMFS3 and VMFS 4 (and therefore LUNs 4 and 5) are grouped together in a third datastore groupProtection groups 1, 2, and 3 are created corresponding to datastore groups 1, 2, and 3At the recovery site for all these VMs, recovery plan 1 is created containing all three protection groups and therefore all 10 of its VMs. This recovery plan would be used if the entire site was lostRecovery plan 2 is also created with only protection group 1 and its 3 VMs. This is for some partial failure – perhaps corresponding to a server rack, an array or a business unit. It would be run to recover that particular set of systemsLUN 5VMFS 4Protection Group 1Protected SiteRecovery Site
11 Array Integration with SRM Vendor-specific scripts support:Array discoveryReplicated LUN discoverySRM Test initiation (simulated failover in an isolated environment)SRM Failover initiation (actual failover of services to the recovery site)Array vendors will be responsible for creating the scripts for their arrays to enable the integration with Site Recovery ManagerNOTE: This slide has animationSRM will leverage Storage Replication Adapters (SRAs) that have been written by the array vendors to ensure tight integration with SRM.The SRAs will perform the following tasks:Array discoveryReplicated LUN discoveryTest & Failover initiation
12 Safety Tip: DNS Validation – The Rule of ‘Four’ Validate DNS is working as expected and by performing the following DNS lookups for the VC,SRM and ESX serversShort nameLong nameReverseForwardNOTE: This slide has animationIt is highly recommended to validate DNS is working as expected and that DNS lookups in the protected and recovery site return the correct resultsDNS should be validated from:the VC serverthe SRM servereach of the ESX serverNOTE: Complete the DNS checks from the protected and the recovery sites
13 Site Recovery Manager 1.0 Prerequisites ESX Server 3.0.2, ESX Server 3.5 or ESX Server 3iVirtualCenter (VC) server version 2.5 installed at the protected site and at the recovery siteSRM server installed at the protected and at the recovery siteSRM plug-in installed on the VI Clients that will access the protected and recovery siteNetwork configuration that allows TCP connectivity between VC servers and SRM serversAn Oracle or SQL Server database that uses ODBC for connectivity in the protected site and in the recovery siteA SRM license installed on the VC license server at the protected site and at the recovery sitePre-configured array-based replication between the protected site and the recovery siteNOTE: This slide has animationSRM prerequisites (make mention of the below)Point out the need for a separate VC servers in the protected and recovery site, with separate databasesPoint out the need for a separate SRM servers in the protected and recovery site, with separate databasesPre-configured array based replication between the protected and recovery site2 VC servers (one per site)2 SRM servers (one per site)4 databases (two per site, one for VC and one for SRM)
14 Installation Workflow At the protected site the following activities are completed:Installation of the SRM serverInstallation of the SRM Plugin into the VI ClientInstallation of the Storage Replication Adapter (SRA)At the recovery site the following activities are completed:Installation of the SRM Plugin into the VI Client *It is important to complete the Site Recovery Manager workflows in the order detailed in this presentationNOTE: This slide has animationSRM installation workflows involves the installation of the SRM server which can be completed at the protected site and then recovery site, or vice versaOnce the Installation Workflow has been completed, it is very important to complete the remaining SRM configuration workflows in the order that is detailed in the presentation* Note: Optional step, only required if a different instance of the VI Client is used to access the recovery site
15 Protected and Recovery Site Datacenters PROTECTED SITENOTE: This slide has animation.Before moving into the SRM Configuration workflows lets review the two datacenters that are depicted to help frame the rest of the presentationProduction Datacenter we wish to protect – vim22dc (Protected Site) and VMs (app_vm1 to app_vm12)BC/DR Datacenter we will failover to – vim23dc (Recovery Site)RECOVERY SITE
16 Site Recovery Manager User Interface NOTE: This slide has animationSRM is accessed via the VI Client. An SRM Plugin is installed onto your VI Client resulting in the Site Recovery Icon highlighted in the slideWith the exception of the Recovery Plans all SRM setup workflow will be completed from the VI Client that is connected into the protected site (Connection, Array Managers, Inventory Preferences and Protection Groups)The Recovery Plan for the VMs in the protected site is created from the VI client that is connected into the recovery site
17 Setup Workflow – Protection Site At the protection site the following setup activities are completed:The user pairs the SRM servers at the protected and recovery sitesSecurity certificates are established between the SRM servers and the VC serversNOTE: This slide has animationStep 1: Pairing of the recovery site (vim23) to the protected site (vim22) which involvesConnecting the VC server in the protected site to the VC server in the recovery sitesCertificate validation between the VC servers in the protected and recovery sitesConnecting the SRM server in the protected site to the SRM server in the recovery sitesCertificate validation between the SRM servers in the protected and recovery sitesReciprocity is establishedPKCS12, Personal Information Exchange Syntax Standard, certificates can be used for things such as signing and file signing. They are different from other certificates in that rather than being only the public or private certificate, they are a combination of both plus the root certificate. This means the person they are made for only has to worry with one file.Certificates that are not properly signed will result in the Yellow Warnings Signs.Reciprocity will still be established allowing you to continue to the next step in the workflow.
18 Setup Workflow – Protection Site Array Managers ConfigurationSelect the correct Manager Type from the Manager type drop down boxStep 2: After the pairing of the site is completed via the SRM Connection wizard the next step is to configure the array managersDuring the installation workflow you installed an SRA for the Array you will be using for the replication of the datastores (datastore groups) between Site1 and Site 2During the Add Array Manager configuration workflow, you will be presented with a window similar to this, you need to select the correct manager type to enable SRM to integrate with the SAN that is replicating the datastores (datastore groups) between Site1 and Site 2
19 Setup Workflow – Protection Site SRM identifies available arrays and replicated datastores and determines the datastore groups.Step 2 continued: SRM will identify which LUNS are being replicated, and present you with the listThe Array Manager wizard involves the follow steps:Protection Site array setup, pairing the array in the protected site to the array in the recovery siteRecovery Site array setupReview the mirrored LUNs
20 Setup Workflow – Protection Site Using the Inventory Preferences Mapper, the user maps resources in the protected site to their counterparts in the recovery site.Step 3: After the Array Mangers setup is completed via the SRM Array Managers wizard the next step is to configure the Inventory Preferences via the SRM Inventory Mapper wizardUsing the Inventory Mapper wizard, the protected VMs now need to be mapped to theNetworksCompute ResourcesVirtual Machine Foldersthat are available at the recovery siteNote: These are global preferences that will be applied to all the protected VMs when they are restarted at the recovery site. In addition to the global preferences individual per VMs customization can also be applied to the protected VMs, for example network configuration information (IP, Mask, Gateway, DNS and WINS servers) to allow the protected VMs to start up correctly on the network at the recovery site
21 Setup Workflow – Protection Site A protection group is a group of VMs that will be failed over together to the recovery siteWorking through the Protection Group wizard you will need to select a location for temporary VirtualCenter Inventory files for the protected VMs at the recovery site.Step 4: After the Inventory Preferences setup is completed the next step is to configure Protection Groups via the SRM Inventory Mapper wizardDuring the creation of the Protection Groups, SRM requires a location to store some temporary VirtualCenter inventory files for the protected VMs. SRM will present the available datastores at the recovery site that could be selected for the storing of these temporary files. It is preferable and suggested that you select a non replicated datastore for these temporary files at the recovery site
22 Setup Workflow – Protection Site Working through the Protection Group wizard a user selects which VMs need to be protected and assigns them to a protection groupThe creation of a protection group results in VC inventory updates in the recovery siteNOTE: This slide has animationStep 4 continued: After the Inventory Preferences setup is completed the next step is to configure Protection Groups via the SRM Inventory Mapper wizardA Protection group has a 1:1 Mapping to a DataStore Group in SRMA Protection Group contains the virtual machines you wish to protect in Site 1 (protected site) and allow for them to failed over to Site 2 (recovery site)This screen provides a summary of protected virtual machines (app_vm1 to app_vm12) and also shows which folders and resource RPs they will be mapped to in the recovery siteClick 1: Show the shadow VM Meta data being written to the temporary storage location that was selected when working through the protection group wizardClick 2: Show the automatic update of the VC Inventory in the Recovery Site as a result of the protection group being created
23 Setup Workflow – Recovery Site At the recovery site the following setup activity is completed:The user creates a recovery plan which is associated to a single or multiple protection groupsStep 5: Working through the Recovery Plan wizard the user completes the setup of a recovery plan that is associated with a single protection group or multiple protection groupsRecovery Plan is a preprogrammed BC/DR run book that will ensure your tests and failovers are executed in a repeatable and reliable manner
24 Site Recovery Manager Recovery Plan VM ShutdownHigh PriorityVM ShutdownAttach Virtual DisksHigh PriorityVM RecoveryNOTE: This slide has animationStep 5: Working through the Recovery Plan wizard the user completes the setup of a recovery plan that is associated with a single protection group or multiple protection groupsSRM recovery plan called ‘Recovery Plan 2 – Protection Group 2’ required to complete a partial site failover for the local data center vim22dc which is protected by SRMThe protected VMs that will be failed over are app_vm7 through to app_vm12 from Protection Group 2 which is associated to the datastore group shared-san-2Low and Normal VM shut downHigh Priority VM shutdownDatastore group preparation at the Recovery SiteRecovery of VMsNormal PriorityVM Recovery
25 Site Recovery Manager Recovery Plan Low PriorityVM RecoveryPost Test CleanupVirtual Disk ResetSite Recovery Manager Recovery Plans:Turn manual BC/DR run books into an automated processSpecify the steps of the recovery process in VirtualCenterProvide a way to test your BC/DR plan in an isolated environment at the recovery site without impacting the protected VMs in the protected siteNOTE: This slide has animationSRM recovery plan called ‘Recovery Plan 2 – Protection Group 2’ required to complete a partial site failover for the local data center vim22dc which is protected by SRM. The protected VMs that will be failed over are app_vm7 through to app_vm12 from Protection Group 2 which is associated to the datastore group shared-san-2
26 Testing a Recovery Plan ‘Test’ a recovery plan by simulating a failover of protected VMs with zero downtime to the protected VMs in the protected siteSRM enables you to ‘Test’ a recovery plan by simulating a failover of virtual machines from the protected site to the recovery site. The benefit of using SRM to run a failover simulation against a recovery plan is that it allows you to confirm that the recovery plan has been setup correctly for the protected VMs. You will be able to confirm that the protected VMs startup in the correct order, taking into account the various application service dependencies for the protected VMs in your environmentIt is worth pointing out that when you select the option to ‘Test’ a recovery plan via SRM, the simulated failover is executed in an isolated environment that includes network and storage infrastructure at the recovery site that is isolated from the protected site (production environment) which ensures the protected VMs at the protected site are not subject to any kind of service interruption during the testing of the recovery planSRM will also create a test report that can be used to demonstrate your level of preparedness to the business or individual business units whose services are being protected by SRM as well as to the auditors and compliance officers if requiredThe simulated failover completes by resetting the environment to be ready for the next event which could be another simulated failover, or an actual failover for a scheduled BC/DR test or in response to an event which resulted in the business declaring a disaster
27 Testing a Recovery Plan NOTE: This slide has animationTesting of a SRM Recovery Plan can be completed without impacting the protected VMs (app_vm7 to app_vm12) at the protected siteSRM recovery plan called ‘Recovery Plan 2 – Protection Group 2’ required to complete a partial site failover for the local data center vim22dc which is protected by SRMThe protected VMs that will be failed over are app_vm7 through to app_vm12 from Protection Group 2 which is associated to the datastore group shared-san-2While the simulated failover test is running, the status of each step that makes up the recovery plan can be monitored by going to Recovery Steps tab in the VI Client which will inform you what steps are currently Running as well as what steps were completed with a Success status. It is worth pointing out that there are some steps in a recovery plan that will only be executed during a simulated test, these steps are identified by ‘Test Only’ under the Mode column, there are also some steps that will only be executed during an actual failover, these steps are identified by ‘Recovery only’ under the Mode columnOnce the simulated failover test is completes a report of the test run can be viewed from the History Tab. The report can be viewed by clicking on the ‘view’ link. The report contains a list of all the steps in the recovery plan along with a status of ‘success or error’ and the duration of each step in the recovery plan
28 Executing FailoverSRM enables you to ‘Run’ a recovery plan which will result in the actual failover of virtual machines from the protected siteThe failover process via SRM is rapid, repeatable, reliable, manageable and auditableThere are two ways to initiate the actual failover, you can either click on the ‘Run’ button or click on the ‘Execute Recovery Plan’ link under the Commands sectionIf there is still connectivity back to the protected site at the time the disaster is declared by the business, SRM will first initiate the power down of the protected VMs at the protected siteWARNING - Executing an actual failover will permanently alter virtual machines and infrastructure of both the protected and recovery sites
29 Executing FailoverWARNING - Executing an actual failover will permanently alter virtual machines and infrastructure of both the protected and recovery sitesThe Run Recovery Plan dialog box warns you that you are about to run the a recovery plan which will result in changes to the protected virtual machines and the infrastructure of both the protected and recovery site datacenters. Click the radio button to confirm you understand the implications of running your recovery plan and then click on the Run Recovery Plan button to start the failover of protected VMs from the protected site to the recovery siteThe Run Recovery Plan dialog box also provides a summary of the Recovery Plan Information, that includes the Recovery Plan that is going to be run, along with the names of the protected and recovery sites, the number of protected VMs that will be failed over as well as a connectivity status from the recovery site back to the protected siteWhile the failover is being executed, the status of each step that makes up the recovery plan can be monitored by going to Recovery Steps tab of the recovery site’s VI Client which will inform you what steps are currently Running as well as what steps were completed with a Success status
30 Failback Options in Site Recovery Manager 1.0 Site Recovery Manager 1.0 does not provide a push-button automated failback process.Failback Options:Without SRM (no startup order, no failback history reports)Work with your storage team, reverse data replicationVM re-inventory*, restart and re-ip (manual or scripted)With SRM (start up order in recovery plan with failback history)Leverage SRM, complete all SRM workflows in the reverse direction from Recovery Site back to the Protected SiteRepeat the above two steps from the Protected Site back to the recovery Site.NOTE: This slide has animationSRM 1.0 does not support automated failback via the SRM UI.Should there be a need to failback data to the designated protected site (original or new) after a an actual DR event or scheduled BC/DR test there are two approaches:Note: Failback will require downtime and depending on the approach could involve multiple steps and reversal of data replication between sites.Without SRM (no startup order, no failback history reports).Power down all protected VMs that were failed over to the recovery site.Working closely with your storage team have them configure replication from the recovery site back to the protected site. Once they have confirmed all data has been replicated back to the protected site move onto the next step.VM re-inventory, restart and re-IP disparate networks in use between the protected and recovery site. Note: VM re-inventory may not be necessary if original VC is intact.With SRM (start up order in recovery plan with failback history)NOW FAILING BACK SERVICES FROM TO THE PROTECTED SITE.Work with your storage team, reverse data replication (Recovery Site back to original Protected Site)Leverage SRM, complete all SRM workflows in the reverse direction from Recovery Site back to the Protected Site.This involves:(Recovery site back to original designated protected site)Delete original recovery plan (RP) in the recovery site.Delete original protection groups (PG) in the protected site.SRM workflow reversal where the original recovery site will now become the new designated protected site and the original protected site becomes the new designated recovery site. (Pair the sites, Array Manager Configuration, Inventory Preferences, Protection Groups – Protected site and Recovery Plan – Recovery site).Initiate a SRM simulated test confirm startup sequence.Initiate an actual failover (failback) with SRM from the Recovery Site to the Protected Site.NOW RE-PROTECTING THE ORIGINAL PROTECTED SITEWork with your storage team, reverse data replication. (Protected Site back to the Recovery Site)Leverage SRM, complete all SRM workflows from Protected Site back to the Recovery Site.(Protected site back to original designated recovery site)Delete recovery plan (RP) in the protected site.Delete protection groups (PG) in the recovery site.SRM workflow setup protecting virtual machines in the Protected Site allowing them to be failed over to the Recovery Site. (Pair the sites, Array Manager Configuration, Inventory Preferences, Protection Groups – Protected site and Recovery Plan – Recovery site).Initiate a SRM simulated test confirm startup sequence, your protected virtual machines are now ready and prepared for a disaster event.* Note: VM re-inventory in VC may not be necessary in the Protected site.
31 Default Roles and Privileges NOTE: This slide has animationTo facilitate the application of specific sets of privileges which will enable you to perform a coherent set of operations, roles specific to SRM will be defined on the VC server during installation. These roles are described here.There are two sets roles. The first set contains the roles required for the primary site user to administer protection. The second set contains the roles required for the secondary site user to administer recovery. Note that the second set of roles also includes the privileges required to perform the necessary actions on the secondary site when the protection is administered from the primary site. This means that when the primary site user is required to login to the remote (secondary) site in order to complete protection configuration, she can use the account privileged to administer recovery there.The following is the list of roles and inventory objects where the roles need to be assigned:Protection Virtual Machine Administrator: This role should be assigned on the protected Virtual Machine object in the VC inventory. It grants the associated user the ability to setup and modify the protection characteristics of the protected virtual machine.Protection SRM Administrator: This role should be assigned on the Service Instance object in the primary SRM inventory. It grants the associated user the ability to pair two sites, configure inventory mappings, and SAN arrays.Protection Groups Administrator: This role should be assigned on the Primary Configuration/Protection Service object in the SRM inventory. It grants the associated user the ability to create and modify protection profiles.Recovery Inventory Administrator: This role should be assigned on the root of the VC inventory. It grants the associated user the ability to view customization specifications existing on the secondary site.Recovery Datacenter Administrator: This role should be assigned on the Datacenter object in the VC inventory where the VMs will be recovered. It grants the associated user the ability to view available datastores and perform recovery (shadow) VM customizations.Recovery Host Administrator: This role should be assigned on the Host or DRS cluster object in the VC inventory where the VM will be recovered. It grants the associated user the ability to configure VM components during recovery.Recovery Virtual Machine Administrator: This role should be assigned on the Folder and Resource Pool objects in the VC inventory where the recovery (shadow) VMs are to be placed. It grants the associated user the ability to create and add shadow VMs to the resource pool and the folder as well as the ability to reconfigure and customize the shadow VMs at runtime and during the process of recovery.Recovery SRM Administrator: This role should be assigned on the Service Instance object in the secondary SRM inventory. It grants the associated user the ability to configure SAN arrays and create protection profiles.Recovery Plans Administrator: This role should be assigned on the Secondary Configuration/Recovery Service object in the SRM inventory. It grants the associated user the ability to reconfigure protection and shadow VMs and setup and run recovery.Note that VC already defines a Read-Only system role which can be used to grant users the ability to view the Disaster Recovery service. In addition, the Administrator role can be used to grant user complete control over both the protection and recovery components.
32 Alarms and Site Status Monitoring Site Recovery Manager will support the following alarm notification actions:Send to specified addressSend SNMP trap to VC trap receiversExecute specified command on VC hostWe recommend you complete setup of alarm notifications for:Remote Site DownRemote Site Ping FailedReplication Group RemovedRecovery Plan DestroyedLicense Server UnreachableSRM will support the configuration of event-triggered alarms so that you can associate a notification action with any given SRM Alarm Event. These alarms are configured via the SRM UI.To get familiar with SRM Alarms and how they work we recommend you enable the 5 listed here.Remote site failure is reflected in the SRM Alarm Events and will not automatically trigger a recovery. This must be initiated manually.
33 Site Recovery Manager Server Monitoring Site Recovery Manager will raise VirtualCenter events for the following conditions:Disk Space LowCPU use exceeded limitMemory lowRemote Site not respondingRemote Site heartbeat failedRecovery Plan Test started, ended, succeeded, failed, or cancelledVirtual Machine Recovery started, ended, succeeded, failed, or reports a warningEach SRM server monitors the CPU utilization, disk space, and memory consumption of the guest on which it is running, and also maintains a heartbeat with its peer SRM server. VC events are sent if any of these measures falls outside of configured bounds.
34 Site Recovery Manager Core Benefits Expand disaster recovery protectionNow any workload in a VM can be protected with minimal incremental effort and costReduce time to recoveryAs soon as disaster is declared, a single button kicks off recovery sequence for hundreds of VMsIncrease reliability of recoveryReplication of system state ensures a VM has all it needs to startupHardware independence eliminates failures due to different hardwareEasier testing based off of actual failover sequence allows more frequent and more realistic testsNOTE: This slide has animationSite Recovery Manager makes it possible for organizations to expand the scope of better disaster recovery protection to more systems.It reduces time to recovery by automating recovery process.It increases reliability of recovery by eliminating several causes of failure encountered by traditional recovery and enabling easier, more frequent testing.Using Site Recovery Manager, you will be able to significantly improve your disaster recovery solution.For one, it makes it easy to expand the number of workloads being protected. By making optimal use of the recovery site resources, we allow a lower barrier to entry than with physical disaster recovery. By automating the difficult and/or manual parts of DR planning, failover, and test (e.g. mapping VMs to storage, booting in the right sequence, taking care of IP changes, etc.), Site Recovery Manager makes the incremental cost of protecting a VM very low from an operational perspective. The only real costs are the disk space at the destination site and enough bandwidth to handle the data change rate of that VM.Site Recovery Manager also significantly reduces time to recovery through automation of the recovery process.And Site Recovery Manager makes the recovery plan far more reliable. It makes hardware dependencies irrelevant by leveraging the hardware independence provided by virtualization. It helps you ensure that the right storage is replicated and only the right storage, including the VM’s system state, which us always completely up to date and patched. It takes care of networking changes you need when you recover to get everything to work properly. And most importantly, it makes it easy for you to do frequent non-disruptive tests to ensure that the recovery plan is correct and that your staff are practiced in executing it successfully.
35 SummarySite Recovery Manager Leverages VMware Infrastructure to Make Disaster RecoveryRapidAutomate disaster recovery processEliminate complexities of traditional recoveryReliableEnsure proper execution of recovery planEnable easier, more frequent testsManageableCentrally manage recovery plansMake plans dynamic to match environmentAffordableUtilize recovery site infrastructureReduce management costsNOTE: This slide has animationIn short, Site Recovery Manager is designed to attack the key challenges of traditional disaster recovery—ensuring rapid, reliable, manageable, and affordable disaster recovery.Site Recovery Manager is designed to leverage VMware Infrastructure to address the key challenges we hear customers talking about regarding disaster recovery.Rapid recovery through automating the recovery process and eliminating complexities like hardware dependenciesReliable recovery by taking out failures due to human error or outdated run books and by enabling easier and more frequent testingManageable recovery by providing a central console for managing recovery plans in the same place as you manage your infrastructureAffordable recovery by leveraging the cost benefits of VMware Infrastructure, making it easy to utilize recovery site hardware for other workloads without impacting your recovery time, and by reducing the operational costs of training and of continued management of your disaster recovery plans
36 Questions?Questions?Q&A Session if time permits.