Presentation is loading. Please wait.

Presentation is loading. Please wait.

1EMC CONFIDENTIAL—INTERNAL USE ONLY VMware vCenter Site Recovery Manager 5.x with EMC VNX Arrays & MirrorView By Dave O’Sullivan

Similar presentations


Presentation on theme: "1EMC CONFIDENTIAL—INTERNAL USE ONLY VMware vCenter Site Recovery Manager 5.x with EMC VNX Arrays & MirrorView By Dave O’Sullivan"— Presentation transcript:

1 1EMC CONFIDENTIAL—INTERNAL USE ONLY VMware vCenter Site Recovery Manager 5.x with EMC VNX Arrays & MirrorView By Dave O’Sullivan

2 2EMC CONFIDENTIAL—INTERNAL USE ONLY Intended Audience: -VNX Block -CLARIION Block -This training is designed to give an overview of SRM and explain how the relevant software plugins and hardware all interact with each other. -This will cover : -Pre-requisites -Design -Test Failover / Failover / Recovery [DEMO] -Required Logs -Troubleshooting

3 3EMC CONFIDENTIAL—INTERNAL USE ONLY Assumptions: -You are familiar [not expert] with: -VM’s ! -vCenter -MirrorView A and S -VNX / CLARIION Arrays -SRA (s) ??? -Before the customer does any work with SRM: -MirrorView is working (check zoning) -All the appropriate software is installed. [including enablers]

4 4EMC CONFIDENTIAL—INTERNAL USE ONLY What is SRM? -ensures the simplest and most reliable disaster protection for all virtualized applications. -Site Recovery plans can be tested non-disruptively as frequently as required to ensure that they meet business objectives. -At the time of a site failover or migration, Site Recovery Manager automates both failover and failback processes, ensuring fast and highly predictable recovery point objectives (RPOs) and recovery time objectives (RTOs).

5 5EMC CONFIDENTIAL—INTERNAL USE ONLY Pre-requisites -SRM is heavily reliant on DNS, so it would be assumed DNS is fully setup and all hosts can be resolved in both directions. -IP Connectivity between all SP’s / VC / ESX on both sites. -SRM is also reliant on Databases, in this setup there are 4 in total. -1 DB for VC -1 DB for SRM -This applies for both sites. -This doc covers the DB setup in full details : Virtual How to Install and Configure SQL Express 2005 For use with Site Recovery Manager V4 Rob Nourse, Sr. Consultant VMware Consulting Services /Install%20%26%20Configure%20SQL%20Express%20for%20use%20with%20S RM4%20v1.3.pdf

6 6EMC CONFIDENTIAL—INTERNAL USE ONLY Design: -….

7 7EMC CONFIDENTIAL—INTERNAL USE ONLY Design considerations : IP / DNS

8 8EMC CONFIDENTIAL—INTERNAL USE ONLY Design considerations : Software / Plugins

9 9EMC CONFIDENTIAL—INTERNAL USE ONLY Design considerations: MirrorView -In order for SRM failover to work, the “protected” VM’s must be located within a LUN that is replicated form production site to DR site. -This is handled my MirrorView A/S (we are using A in this setup) -Below is the LUN info for my setup:

10 10EMC CONFIDENTIAL—INTERNAL USE ONLY Design considerations: MirrorView Zoning -For MirrorView to work, we need to ensure that the appropriate ports are zoned together. -So in the setup, the FC ports used for MirrorView are zoned to the opposite Array.

11 11EMC CONFIDENTIAL—INTERNAL USE ONLY Design considerations: MirrorView

12 12EMC CONFIDENTIAL—INTERNAL USE ONLY Design considerations: MirrorView -LUN is created first on Prod (Athena) side -Then used MirrorView options “create secondary mirror” and follow thru wizard. -I used the Mirror Wizard to complete this task

13 13EMC CONFIDENTIAL—INTERNAL USE ONLY Design considerations: MirrorView -When its working, it should look like this:

14 14EMC CONFIDENTIAL—INTERNAL USE ONLY Design considerations: Reserved Lun Pool -You must add LUNs with adequate capacity to the Reserved LUN Pool before proceed. -This will be used when the SRA calls a snapshot for the SRM failover test (only!)

15 15EMC CONFIDENTIAL—INTERNAL USE ONLY Design considerations: DEMO

16 16EMC CONFIDENTIAL—INTERNAL USE ONLY Design considerations: -That’s pretty much it on the VNXZ side -Once MirrorView is up & running, you should be good to go with the SRM windows / VMWare side of the setup. -Next -What is the SRA?

17 17EMC CONFIDENTIAL—INTERNAL USE ONLY SRA [Storage Replication Adapters] -What is the SRA? -The SRA is a windows.exe installed on the same windows box as SRM as part of the SRM setup -The vCenter “talks” to the SRA -> the SRA sends navi commands to the Array. -This is why naviseccli is required to be installed don the same box as SRM (check path!) -Each Array vendor has their own SRA adapters. -The SRA’s are EMC code, so we support them! SRAs for SRM 5.x For the full list of storage replication adapters supported by SRM 5.x, see ory=srahttp://www.vmware.com/resources/compatibility/search.php?deviceCateg ory=sra.

18 18EMC CONFIDENTIAL—INTERNAL USE ONLY SRA [Storage Replication Adapters] -These are the most current supported EMC SRA’s

19 19EMC CONFIDENTIAL—INTERNAL USE ONLY SRA [Storage Replication Adapters] -So on both sites, the following should be installed: -Note that there is 2 SRA’s -VNX SRA for vCenter -MirrorView enabler for VNX SRA -As we are only doing block replication, we only need the MirrorView enabler. -NFS replication is also possible using the EMC_VNX_Replicator_Enabler_for_VNX_SRA_v5.0.xx

20 20EMC CONFIDENTIAL—INTERNAL USE ONLY SRA [Storage Replication Adapters] DEMO

21 21EMC CONFIDENTIAL—INTERNAL USE ONLY Test Failover – sequence of events We will look happens on both sites concerning: SRM VNX ESX

22 22EMC CONFIDENTIAL—INTERNAL USE ONLY Test Failover – sequence of events [PROD]

23 23EMC CONFIDENTIAL—INTERNAL USE ONLY Test Failover – sequence of events [PROD] - Ensure to check the output form: Recovery Plan History Report VMware Site Recovery Manager 5.0

24 24EMC CONFIDENTIAL—INTERNAL USE ONLY Test Failover – sequence of events [PROD] - The is a cosmetic issue which *should* be fixed in later versions of SRM Warning: Failed to update embedded paths in virtual machine file '/vmfs/volumes/507432e1-3a92a2a4- 027e-b8ac6f866cc6/2008-1/2008-1_1.vmdk'. A general system error occurred: No such device Failed to update embedded paths in virtual machine file '/vmfs/volumes/507432e1-3a92a2a4-027e- b8ac6f866cc6/2008-2/2008-2_1.vmdk'. A general system error occurred: No such device

25 25EMC CONFIDENTIAL—INTERNAL USE ONLY Test Failover – sequence of events [PROD ESX] - Some fairly serious errors in the vmkernel on prod. Esx, these can be ignored. )WARNING: VMW_SATP_LIB_CX: satp_lib_cx_otherSPIsHung:338:Path "vmhba2:C0:T1:L3" Peer SP is hung. )WARNING: VMW_SATP_LIB_CX: satp_lib_cx_otherSPIsHung:338:Path "vmhba3:C0:T0:L3" Peer SP is hung. )ALERT: NMP: vmk_NmpVerifyPathUID:1166:The physical media represented by device naa da02e0012ce6a8f930de211 (path vmhba3:C0:T1:L9) has changed. If this is a data LUN, this is a critical error. Detecte[0$ )ALERT: NMP: vmk_NmpVerifyPathUID:1166:The physical media represented by device naa da02e0012ce6a8f930de211 (path vmhba2:C0:T0:L9) has changed. If this is a data LUN, this is a critical error. Detecte[0$ )NMP: nmp_DeviceUpdatePathStates:547: Activated path "vmhba2:C0:T1:L9" for NMP device "naa da02e0012ce6a8f930de211". - Watch out for messages like this, customers could open cases based on these errors alone…

26 26EMC CONFIDENTIAL—INTERNAL USE ONLY Test Failover – sequence of events [DR]

27 27EMC CONFIDENTIAL—INTERNAL USE ONLY Test Failover – sequence of events[DR ESX] LVM:8445:00:00Device naa e005e32bf721a12e211:1 detected to be a snapshot: LVM:8452:00:00queried disk ID: LVM:8459:00:00on-disk disk ID: LVM:8445:00:00Device naa e005e32bf721a12e211:1 detected to be a snapshot: LVM:8452:00:00queried disk ID: LVM:8459:00:00on-disk disk ID: LVM:8445:00:00Device naa e005e32bf721a12e211:1 detected to be a snapshot: LVM:8452:00:00queried disk ID: LVM:8459:00:00on-disk disk ID: LVM:8445:00:00Device naa e005e32bf721a12e211:1 detected to be a snapshot: LVM:8452:00:00queried disk ID: LVM:8459:00:00on-disk disk ID: LVM:8825:00:00Device naa e005e32bf721a12e211:1 unsnapped LVM:5510:00:00Snapshot LV successfully resignatured LVM:13188: One or more LVM devices have been discovered.

28 28EMC CONFIDENTIAL—INTERNAL USE ONLY Test Failover – sequence of events[DR VNX] A 10/09/12 14:20:23 SnapCopy a Snapshot Logical Unit device CopyDisk0000 has been created. B 10/09/12 14:20:23 SnapCopy a Snapshot Logical Unit device CopyDisk0000 has been created. A 10/09/12 14:20: 'Create a SnapShot LU' called by 'admin' ( ) on 'Navi_SnapCopyFeature' with result: Success (Successfully created SnapShot LU.) A 10/09/12 14:20: '' called by 'admin' ( ) on 'Navi_SnapCopyFeature' with result: Success (Started SnapView session successfully. Session name - async-25_SRM-TEST-FAILOVER_session) A 10/09/12 14:20:27 SnapCopy SnapView persistent session async-25_SRM-TEST-FAILOVER_session has been started on LUN 25. B 10/09/12 14:20:27 SnapCopy SnapView persistent session async-25_SRM-TEST-FAILOVER_session has been started on LUN 25. A 10/09/12 14:20: 'Activate' called by 'admin' ( ) on 'SnapShot WWN: 60:06:01:60:54:50:2E:00:92:7D:58:76:1C:12:E2:11' with result: Success (Successfully activated snapshot LU: 60:06:01:60:54:50:2E:00:92:7D:58:76:1C:12:E2:11 (async-25_SRM-TEST-FAILOVER_session)) A 10/09/12 14:20: 'ExecuteClientRequest' called by ' Navi User admin' ( ) on 'CLIFeature' (Result: Success). snapview - storagegroup -addsnapshot -gname SG_dellpr710-g.emcvmw.ctc -hlu 9 -snapshotname async-25_SRM-TEST-FAILOVER -compatibilitymode called by 'admin' A 10/09/12 14:20:39 RemoteMirror MirrorView quiesce LU request. A 10/09/12 14:20:39 RemoteMirror RM_ADMIN_INFO_WILL_REBIND the object. A 10/09/12 14:20:39 RemoteMirror MirrorView rebind request for LUN b9502e00:a4cc7fc81507e211. A 10/09/12 14:20:39 RemoteMirror MirrorView quiesce LU request. A 10/09/12 14:20:39 SnapCopy SnapView has been bound to device B 10/09/12 14:20:39 RemoteMirror Quiesce request from peer SP. B 10/09/12 14:20:39 RemoteMirror Rebind request from peer SP for LUN b9502e00:a4cc7fc81507e211. B 10/09/12 14:20:39 SnapCopy SnapView has been bound to device Disk0001. B 10/09/12 14:20:39 RemoteMirror Rebind request from peer SP for LUN b9502e00:a4cc7fc81507e211. A 10/09/12 14:20: 'Create a SnapShot LU' called by 'admin' ( ) on 'Navi_SnapCopyFeature' with result: Success (Successfully created SnapShot LU.) A 10/09/12 14:20:40 SnapCopy a Snapshot Logical Unit device CopyDisk0001 has been created. B 10/09/12 14:20:40 SnapCopy a Snapshot Logical Unit device CopyDisk0001 has been created. A 10/09/12 14:20: '' called by 'admin' ( ) on 'Navi_SnapCopyFeature' with result: Success (Started SnapView session successfully. Session name - sync-24_SRM-TEST-FAILOVER_session) A 10/09/12 14:20:43 Bus1 Enc0 DskE 60a A logical unit has been enabled [ALU 2] 0 ffff e A 10/09/12 14:20:43 SnapCopy SnapView persistent session sync-24_SRM-TEST-FAILOVER_session has been started on LUN 24.

29 29EMC CONFIDENTIAL—INTERNAL USE ONLY Test Failover – sequence of events[DR VNX] B 10/09/12 14:20:43 Bus1 Enc0 DskE 606 Unit Shutdown for Trespass [ALU 2] 0 ffff e B 10/09/12 14:20:43 SnapCopy SnapView persistent session sync-24_SRM-TEST-FAILOVER_session has been started on LUN 24. A 10/09/12 14:20: 'Activate' called by 'admin' ( ) on 'SnapShot WWN: 60:06:01:60:54:50:2E:00:32:0B:08:80:1C:12:E2:11' with result: Success (Successfully activated snapshot LU: 60:06:01:60:54:50:2E:00:32:0B:08:80:1C:12:E2:11 (sync-24_SRM-TEST-FAILOVER_session)) A 10/09/12 14:20: 'ExecuteClientRequest' called by ' Navi User admin' ( ) on 'CLIFeature' (Result: Success). snapview - storagegroup -addsnapshot -gname SG_dellpr710-g.emcvmw.ctc -hlu 10 -snapshotname sync-24_SRM-TEST-FAILOVER -compatibilitymode called by 'admin' A 10/09/12 14:20:55 RemoteMirror MirrorView quiesce LU request. A 10/09/12 14:20:55 RemoteMirror MirrorView rebind request for LUN b9502e00:fe606cb30d03e211. A 10/09/12 14:20:55 RemoteMirror MirrorView quiesce LU request. A 10/09/12 14:20:55 SnapCopy SnapView has been bound to device f. A 10/09/12 14:20:55 RemoteMirror RM_ADMIN_INFO_WILL_REBIND the object. B 10/09/12 14:20:55 RemoteMirror Quiesce request from peer SP. B 10/09/12 14:20:55 RemoteMirror Rebind request from peer SP for LUN b9502e00:fe606cb30d03e211. B 10/09/12 14:20:55 RemoteMirror Quiesce request from peer SP. B 10/09/12 14:20:55 SnapCopy SnapView has been bound to device Disk0002. B 10/09/12 14:20:55 RemoteMirror Rebind request from peer SP for LUN b9502e00:fe606cb30d03e211. A 10/09/12 14:20: 'Create a SnapShot LU' called by 'admin' ( ) on 'Navi_SnapCopyFeature' with result: Success (Successfully created SnapShot LU.) A 10/09/12 14:20:56 SnapCopy a Snapshot Logical Unit device CopyDisk0002 has been created. B 10/09/12 14:20:56 SnapCopy a Snapshot Logical Unit device CopyDisk0002 has been created. A 10/09/12 14:20: '' called by 'admin' ( ) on 'Navi_SnapCopyFeature' with result: Success (Started SnapView session successfully. Session name - sync-0_SRM-TEST-FAILOVER_session) A 10/09/12 14:20:59 SnapCopy SnapView persistent session sync-0_SRM-TEST-FAILOVER_session has been started on LUN 0. A 10/09/12 14:20:59 Bus1 Enc0 DskE 60a A logical unit has been enabled [ALU 3] 0 ffff e B 10/09/12 14:20:59 Bus1 Enc0 DskE 606 Unit Shutdown for Trespass [ALU 3] 0 ffff e B 10/09/12 14:20:59 SnapCopy SnapView persistent session sync-0_SRM-TEST-FAILOVER_session has been started on LUN 0. A 10/09/12 14:21: 'Activate' called by 'admin' ( ) on 'SnapShot WWN: 60:06:01:60:54:50:2E:00:54:39:77:89:1C:12:E2:11' with result: Success (Successfully activated snapshot LU: 60:06:01:60:54:50:2E:00:54:39:77:89:1C:12:E2:11 (sync-0_SRM-TEST-FAILOVER_session)) A 10/09/12 14:21: 'ExecuteClientRequest' called by ' Navi User admin' ( ) on 'CLIFeature' (Result: Success). snapview - storagegroup -addsnapshot -gname SG_dellpr710-g.emcvmw.ctc -hlu 11 -snapshotname sync-0_SRM-TEST-FAILOVER -compatibilitymode called by 'admin' f

30 30EMC CONFIDENTIAL—INTERNAL USE ONLY Test Failover – sequence of events [DEMO]

31 31EMC CONFIDENTIAL—INTERNAL USE ONLY Cleanup – sequence of events [PROD] Recovery StepResultStep StartedStep CompletedExecution Time 1. Power Off Test VMs at Recovery Site Success :46:24 (UTC 0) :46:30 (UTC 0) Success :46:24 (UTC 0) :46:30 (UTC 0) Power OffSuccess :46:24 (UTC 0) :46:25 (UTC 0) Reset StorageSuccess :46:30 (UTC 0) Success :46:24 (UTC 0) :46:30 (UTC 0) Power OffSuccess :46:24 (UTC 0) :46:27 (UTC 0) Reset StorageSuccess :46:30 (UTC 0) 2. Resume Non-critical VMs at Recovery Site Inactive 3. Discard Test Data and Reset Storage Success :46:30 (UTC 0) :47:25 (UTC 0) 3.1. Protection Group test7Success :46:30 (UTC 0) :47:25 (UTC 0) Device "Mirror of dellpr710-c.w2k8.emcvm...": Success Device "Mirror of dellpr710-c.w2k8.emcvmw.ctc RDM-23": Success Device "Mirror of dellpr710-c.w2k8.emcvmw.ctc SRM_VMs": Success

32 32EMC CONFIDENTIAL—INTERNAL USE ONLY Cleanup– sequence of events [PROD ESX] - Some fairly serious errors in the vmkernel on prod. Esx, these can be ignored. )WARNING: VMW_SATP_LIB_CX: satp_lib_cx_otherSPIsHung:338:Path "vmhba2:C0:T1:L3" Peer SP is hung. )WARNING: VMW_SATP_LIB_CX: satp_lib_cx_otherSPIsHung:338:Path "vmhba3:C0:T0:L3" Peer SP is hung. )ALERT: NMP: vmk_NmpVerifyPathUID:1166:The physical media represented by device naa da02e0012ce6a8f930de211 (path vmhba3:C0:T1:L9) has changed. If this is a data LUN, this is a critical error. Detecte[0$ )ALERT: NMP: vmk_NmpVerifyPathUID:1166:The physical media represented by device naa da02e0012ce6a8f930de211 (path vmhba2:C0:T0:L9) has changed. If this is a data LUN, this is a critical error. Detecte[0$ )NMP: nmp_DeviceUpdatePathStates:547: Activated path "vmhba2:C0:T1:L9" for NMP device "naa da02e0012ce6a8f930de211". - Watch out for messages like this, customers could open cases based on these errors alone…

33 33EMC CONFIDENTIAL—INTERNAL USE ONLY Cleanup – sequence of events [DR ESX]

34 34EMC CONFIDENTIAL—INTERNAL USE ONLY Cleanup – sequence of events [DR VNX] A 10/09/12 14:46: 'storagegroup' called by ' Navi User admin' ( ) with result: Success (Navisphere CLI command: ' storagegroup -removesnapshot -o -gname SG_dellpr710-g.emcvmw.ctc -snapshotname async-25_SRM-TEST-FAILOVER ') A 10/09/12 14:46: 'Stop' called by 'admin' ( ) on 'Session Name: async-25_SRM-TEST-FAILOVER_session' with result: Success (Deactivated snapshot LU successfully: 60:06:01:60:54:50:2E:00:92:7D:58:76:1C:12:E2:11 (async-25_SRM-TEST- FAILOVER_session)Stopped session su A 10/09/12 14:46:50 SnapCopy SnapView session async-25_SRM-TEST-FAILOVER_session has been stopped on LUN 25 with status of 0. B 10/09/12 14:46:50 SnapCopy SnapView session async-25_SRM-TEST-FAILOVER_session has been stopped on LUN 25 with status of 0. A 10/09/12 14:46:52 SnapCopy b Snapshot Logical Unit device CopyDisk0000 has been removed. A 10/09/12 14:46:52 NaviCimom Failing Command: Set LUN. B 10/09/12 14:46:52 SnapCopy b Snapshot Logical Unit device CopyDisk0000 has been removed. A 10/09/12 14:46: 'Destroy a SnapShot' called by 'admin' ( ) on 'SnapShot WWN: 60:06:01:60:54:50:2E:00:92:7D:58:76:1C:12:E2:11' with result: Success (Destroy snapshot successfully: 60:06:01:60:54:50:2E:00:92:7D:58:76:1C:12:E2:11) B 10/09/12 14:46:53 NaviCimom Failing Command: Set LUN. A 10/09/12 14:47: 'storagegroup' called by ' Navi User admin' ( ) with result: Success (Navisphere CLI command: ' storagegroup -removesnapshot -o -gname SG_dellpr710-g.emcvmw.ctc -snapshotname sync-24_SRM-TEST-FAILOVER ') A 10/09/12 14:47: 'Stop' called by 'admin' ( ) on 'Session Name: sync-24_SRM-TEST-FAILOVER_session' with result: Success (Deactivated snapshot LU successfully: 60:06:01:60:54:50:2E:00:32:0B:08:80:1C:12:E2:11 (sync-24_SRM-TEST- FAILOVER_session)Stopped session succ A 10/09/12 14:47:01 Bus1 Enc0 DskE 606 Unit Shutdown for Trespass [ALU 2] 0 ffff e A 10/09/12 14:47:01 SnapCopy SnapView session sync-24_SRM-TEST-FAILOVER_session has been stopped on LUN 24 with status of 0. B 10/09/12 14:47:01 SnapCopy SnapView session sync-24_SRM-TEST-FAILOVER_session has been stopped on LUN 24 with status of 0. B 10/09/12 14:47:01 Bus1 Enc0 DskE 60a A logical unit has been enabled [ALU 2] 0 ffff e A 10/09/12 14:47:02 SnapCopy b Snapshot Logical Unit device CopyDisk0001 has been removed. B 10/09/12 14:47:02 SnapCopy b Snapshot Logical Unit device CopyDisk0001 has been removed. A 10/09/12 14:47: 'Destroy a SnapShot' called by 'admin' ( ) on 'SnapShot WWN: 60:06:01:60:54:50:2E:00:32:0B:08:80:1C:12:E2:11' with result: Success (Destroy snapshot successfully: 60:06:01:60:54:50:2E:00:32:0B:08:80:1C:12:E2:11) A 10/09/12 14:47:03 RemoteMirror MirrorView quiesce LU request. A 10/09/12 14:47:03 RemoteMirror RM_ADMIN_INFO_WILL_REBIND the object. A 10/09/12 14:47:03 RemoteMirror MirrorView rebind request for LUN b9502e00:a4cc7fc81507e211.

35 35EMC CONFIDENTIAL—INTERNAL USE ONLY Cleanup – sequence of events [DR VNX] B 10/09/12 14:47:03 RemoteMirror Quiesce request from peer SP. B 10/09/12 14:47:03 RemoteMirror Rebind request from peer SP for LUN b9502e00:a4cc7fc81507e211. B 10/09/12 14:47:03 RemoteMirror Quiesce request from peer SP. B 10/09/12 14:47:03 SnapCopy SnapView has been unbound from device Disk0001. B 10/09/12 14:47:03 RemoteMirror Rebind request from peer SP for LUN b9502e00:a4cc7fc81507e211. A 10/09/12 14:47: 'storagegroup' called by ' Navi User admin' ( ) with result: Success (Navisphere CLI command: ' storagegroup -removesnapshot -o -gname SG_dellpr710-g.emcvmw.ctc -snapshotname sync-0_SRM-TEST-FAILOVER ') A 10/09/12 14:47: 'Stop' called by 'admin' ( ) on 'Session Name: sync-0_SRM-TEST-FAILOVER_session' with result: Success (Deactivated snapshot LU successfully: 60:06:01:60:54:50:2E:00:54:39:77:89:1C:12:E2:11 (sync-0_SRM-TEST- FAILOVER_session)Stopped session succes A 10/09/12 14:47:10 Bus1 Enc0 DskE 606 Unit Shutdown for Trespass [ALU 3] 0 ffff e A 10/09/12 14:47:10 SnapCopy SnapView session sync-0_SRM-TEST-FAILOVER_session has been stopped on LUN 0 with status of 0. B 10/09/12 14:47:10 SnapCopy SnapView session sync-0_SRM-TEST-FAILOVER_session has been stopped on LUN 0 with status of 0. B 10/09/12 14:47:10 Bus1 Enc0 DskE 60a A logical unit has been enabled [ALU 3] 0 ffff e A 10/09/12 14:47: 'Destroy a SnapShot' called by 'admin' ( ) on 'SnapShot WWN: 60:06:01:60:54:50:2E:00:54:39:77:89:1C:12:E2:11' with result: Success (Destroy snapshot successfully: 60:06:01:60:54:50:2E:00:54:39:77:89:1C:12:E2:11) A 10/09/12 14:47:12 RemoteMirror MirrorView quiesce LU request. A 10/09/12 14:47:12 RemoteMirror RM_ADMIN_INFO_WILL_REBIND the object. A 10/09/12 14:47:12 RemoteMirror MirrorView rebind request for LUN b9502e00:fe606cb30d03e211. A 10/09/12 14:47:12 RemoteMirror MirrorView quiesce LU request. A 10/09/12 14:47:12 SnapCopy b Snapshot Logical Unit device CopyDisk0002 has been removed. B 10/09/12 14:47:12 SnapCopy b Snapshot Logical Unit device CopyDisk0002 has been removed. B 10/09/12 14:47:12 RemoteMirror Rebind request from peer SP for LUN b9502e00:fe606cb30d03e211. B 10/09/12 14:47:12 RemoteMirror Quiesce request from peer SP. B 10/09/12 14:47:12 SnapCopy SnapView has been unbound from device Disk0002. B 10/09/12 14:47:12 RemoteMirror Rebind request from peer SP for LUN b9502e00:fe606cb30d03e211. B 10/09/12 14:47:12 RemoteMirror Quiesce request from peer SP.

36 36EMC CONFIDENTIAL—INTERNAL USE ONLY Cleanup – sequence of events [PROD]

37 37EMC CONFIDENTIAL—INTERNAL USE ONLY Failover - DEMO

38 38EMC CONFIDENTIAL—INTERNAL USE ONLY Troubleshooting: Obtaining the correct logs -Ensure to capture the SRM & SRA logs. -Please use VMWare KB “Export system Logs” -Please complete these actions on both sites! -http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&externalId=

39 39EMC CONFIDENTIAL—INTERNAL USE ONLY Troubleshooting: Obtaining the correct logs If the issue is related to a Test Failover or actual Failover then having the failed Recovery Plan export log will also be invaluable in troubleshooting the issue. To generate the log Export for the failed Recovery Plan: In the left pane, click Recovery Plans and select the Recovery Plan which had the issue. Select the Plan Name which is showing an Error in the Result column. On the Plan Name with the error click the Export action to generate the report for the failed Test Failover or actual Failover. Save the file to your desktop and upload this file with the SRM system logs.

40 40EMC CONFIDENTIAL—INTERNAL USE ONLY Troubleshooting: Obtaining the correct logs -Exported information will look like this, take note of the time stamps as this is what we will use to search thru the SPCOLLECT with -The errors listed here are extremely useful in the actual diagnosis of the issue.

41 41EMC CONFIDENTIAL—INTERNAL USE ONLY Troubleshooting: log files of interest: -There are 2 main folder of interest within the exported log bundles: -The Logs folder surprisingly enough: -This will contain all the activity form the SRM application on that particular site. -Extract all.gz archives in case the errors you are searching for a while back…. -The mail file of interest in this folder is called “vmware-dr-XX” -Sort by date and review most recent, or search for time stamp obtained form the html page described on slide 20.

42 42EMC CONFIDENTIAL—INTERNAL USE ONLY Troubleshooting: log files of interest: -Please note the sate in the SRM logs are in the following format T09:10: :00 -The dates in the exported.html page are in: :10:29 (UTC 0) -So adjust accordingly when searching for errors across logs. -In my example I’ll search thru the SRM logs with : T09 -This will be a good start point -For the Linux heads, this is what I’ using to make the logs more human readable: -grep " T09" vmware-dr-3*|grep -v "<" |less

43 43EMC CONFIDENTIAL—INTERNAL USE ONLY Troubleshooting: log files of interest: -Most on the information in these vmware-dr-XX logs are really of more interest to VMWare than EMC, as its just really verbose logging of the SRM application and database interaction. -No harm I having a peek to see it there is anything n jumping out though. -These log flies we want to next focus on is the SRA logs, and there are a few different logs. -Location = srm-support\Logs\SRAs\EMC VNX SRA -sra_discoverArrays_ _ sra_discoverDevices_ _ sra_failover_ _ sra_prepareFailover_ _ sra_prepareReverseReplication_ _ sra_queryCapabilities_ _ sra_queryConnectionParameters_ _ sra_queryErrorDefinitions_ _ sra_queryInfo_ _ sra_queryReplicationSettings_ _ sra_queryStrings_ _ sra_querySyncStatus_ _ sra_reverseReplication_ _ sra_syncOnce_ _ sra_testFailoverStart_ _ sra_testFailoverStop_ _

44 44EMC CONFIDENTIAL—INTERNAL USE ONLY Troubleshooting: log files of interest: -Timestamp format again is similar here, so just take note of it. -Get timestamp from.html page as before: -The SRA log folder will be loaded full of many logs so I’m just going to focus on the logs form Sept The above error was received when trying to do a Test Failover. -We need to check in the following log: -sra_testFailoverStart_ _ log -Note the UTC time adjustment.

45 45EMC CONFIDENTIAL—INTERNAL USE ONLY Troubleshooting: log files of interest: -Looking at the log file, we can pull some very useful information: -[sra_testFailoverStart_ _ log] -grep / search for: -com.emc.mirrorview.platform.naviseccli.NaviseccliConnection -This will show the actual navi commands that are being issued by the SRA to the SP

46 46EMC CONFIDENTIAL—INTERNAL USE ONLY Troubleshooting: log files of interest: -Within the same log file search / grep for : -Command result: -This should give a clear indication of where the error lies. -In this case, looks like we have a issue with Snapview

47 47EMC CONFIDENTIAL—INTERNAL USE ONLY Troubleshooting: log files of interest: -Switch over to the SPCOLLECT’s for both sites, and grep out any messages related to SnapCopy -On the DR site, we can see: /cygdrive/c/Users/Dave/Documents.backup/Logs/SRM_LOGS_PPTX/Pandora -$ grep SnapCopy "TRiiAGE_full_SPlogs.txt“ -B 09/02/12 07:54:35 NaviCimom b Failing Command: K10SnapCopyAdmin DBid 0 Op A 09/02/12 07:54:43 NaviCimom b Failing Command: K10SnapCopyAdmin DBid 0 Op A 09/02/12 07:54: 'Create a SnapShot LU' called by 'admin' ( ) on 'Navi_SnapCopyFeature' with result: Failure (Could not create SnapShot LU.. [0x B] A SnapView snapshot already exists with the specified name (0x b)) -A 09/02/12 07:54:45 SnapCopy You must add LUNs with adequate capacity to the Reserved LUN Pool before you can use this feature. -A 09/02/12 07:54: '' called by 'admin' ( ) on 'Navi_SnapCopyFeature' with result: Failure (Could not start SnapView session. Session name - sync-0_SRM-TEST-FAILOVER_session. [0x ] You must add LUNs with adequate capacity to the Reserved LUN Pool before you -A 09/02/12 07:54:46 NaviCimom Failing Command: K10SnapCopyAdmin DBid 0 Op This is indicating that there is no Reserve Lun Pool setup, as described on slide 13.

48 48EMC CONFIDENTIAL—INTERNAL USE ONLY Troubleshooting: Logs needed - Recap. So for all SRM / SRA cases you will need the following logs: –SPCOLLECT form both sites –SRM logs form both sites –SRA logs form both sites. –“Recovery Plan Export Log.html” as explained on slide 19 Seriously, don’t proceed until you have everything listed above. C&docType=kc&externalId=

49 49EMC CONFIDENTIAL—INTERNAL USE ONLY Troubleshooting: Workflow Ok, so for every SRM / SRA case that does go in, the following should apply as valid workflow towards resolution of the case. 1.Collect Logs 2.Check MirrorView & confirm it is actually working. 3.Have customer reconfirm DNS & IP connectivity is OK 1.all hosts should be DNS resolvable on both sites 2.All hosts / SP’s should have IP connectivity on same VLAN, all hosts / SP’s should be able to ping each other… 4.Confirm Software requirements listed on Slide 7 5.Check error that is reported in Recovery Plan Export Log.html 6.Search in TRiiAGE_full_Splogs for any errors at the time reported on the Recovery Plan Export Log.html

50 50EMC CONFIDENTIAL—INTERNAL USE ONLY Log error / message examples In this section we will provide examples of errors and informative messages that may assist in troubleshooting your issue.

51 51EMC CONFIDENTIAL—INTERNAL USE ONLY Log error / message examples Search for errors that are returned form the Navi commands within the SRA logs. In particular : look for Command result: stdout(Error: Search the whole folder of SRA logs as you will get hits on different files depending on the issue you are having. Some examples: [sra_testFailoverStart_ _ log] :40:31,693 [com.emc.mirrorview.platform.snapshot.HashMapSnapviewSnapshotRepository]: Caching SnapView snapshot with name sync-0_SRM-TEST-FAIL OVER :40:31,693 [com.emc.mirrorview.platform.snapshot.session.SnapviewSessionServiceImpl]: Starting SnapView session with name: sync-0_SRM-TEST-FA ILOVER_session, for snapshot: sync-0_SRM-TEST-FAILOVER :40:31,693 [com.emc.mirrorview.platform.naviseccli.NaviseccliConnection]: Executing command: snapview - startsession "sync-0_SRM- TEST-FAILOVER_session" -snapshotname "sync-0_SRM-TEST-FAILOVER" -persistence :40:33,003 [com.emc.mirrorview.platform.naviseccli.NaviseccliConnection]: Command result: stdout(Error: snapview command failed You must add LUNs with adequate capacity to the Reserved LUN Pool before you can use this feature. (0x )), stderr() :40:33,003 [com.emc.mirrorview.platform.snapshot.session.SnapviewSessionServiceImpl]: Retrieving info for SnapView session with name sync-0_SRM-TEST-FAILOVER_session :40:33,003 [com.emc.mirrorview.platform.naviseccli.NaviseccliConnection]: Executing command: snapview - listsessions -name "sync-0_SRM-TEST-FAILOVER_session"

52 52EMC CONFIDENTIAL—INTERNAL USE ONLY Log error / message examples sra_discoverArrays_ _ log :15:51,457 [com.emc.sra.SraController]: Building SRM command response :15:51,457 [com.emc.sra.ResponseBuilder]: Building discoverArrays response :15:51,473 [com.emc.sra.mirrorview.MirrorviewCommands]: MirrorView Enabler Version: :15:51,473 [com.emc.mirrorview.platform.naviseccli.NaviseccliConnection]: Executing command: arrayname :15:51,504 [com.emc.mirrorview.platform.naviseccli.NaviseccliConnection]: Command result: stdout(), stderr('naviseccli' is not recognized as an internal or external command, operable program or batch file.) :15:51,504 [com.emc.mirrorview.platform.naviseccli.NaviseccliConnection]: Unknown error occurred while opening naviseccli connection :15:51,504 [com.emc.sra.mirrorview.MirrorviewCommands]: Unable to connect using SPA, trying SPB :15:51,504 [com.emc.mirrorview.platform.naviseccli.NaviseccliConnection]: Executing command: arrayname :15:51,520 [com.emc.mirrorview.platform.naviseccli.NaviseccliConnection]: Command result: stdout(), stderr('naviseccli' is not recognized as an internal or external command, operable program or batch file.) :15:51,520 [com.emc.mirrorview.platform.naviseccli.NaviseccliConnection]: Unknown error occurred while opening naviseccli connection :15:51,520 [com.emc.sra.ResponseBuilder]: Unable to get SRA Enabler for this connection info Unable to get SRA Enabler for this connection infocom.emc.sra.ResponseBuilder.getEnabler(ResponseBuilder.java:346) It would seem form the above that Naviseccli is not installed properly on SRM host, check path!

53 53EMC CONFIDENTIAL—INTERNAL USE ONLY Log error / message examples If MirrorView is working is would look like this in sra_discoverDevices_ _ log :20:32,823 [com.emc.mirrorview.platform.naviseccli.NaviseccliConnection]: Executing command: mirror -sync -info -systems :20:34,102 [com.emc.mirrorview.platform.naviseccli.NaviseccliConnection]: Command result: stdout(Remote systems that can be enabled for mirroring: Remote systems that are enabled for mirroring: Array UID: 50:06:01:60:C7:20:0A:2D Status: Enabled on both SPs), stderr() :20:34,102 [com.emc.sra.response.ReplicatedDevicesBuilder]: Attempted discovery of peer array:50:06:01:60:C7:20:0A:2Dfailed :20:34,102 [com.emc.mirrorview.platform.mirror.MirrorServiceImpl]: ************* SYNC MIRRORS *************** :20:34,102 [com.emc.mirrorview.platform.naviseccli.NaviseccliConnection]: Executing command: mirror -sync -list :20:35,381 [com.emc.mirrorview.platform.naviseccli.NaviseccliConnection]: Command result: stdout(MirrorView Name: Mirror of dellpr710-c.w2k8.emcvmw.ctc Datastore_1 MirrorView Description: MirrorView UID: 50:06:01:60:BE:A0:39:93:03:00:00:00:00:00:00:00 Logical Unit Numbers: 0 Remote Mirror Status: Mirrored MirrorView State: Active MirrorView Faulted: NO MirrorView Transitioning: NO Quiesce Threshold: 60 Minimum number of images required: 0 Image Size: Image Count: 2 Write Intent Log Used: YES Images: Image UID: 50:06:01:60:BE:A0:39:93 Is Image Primary: YES Logical Unit UID: 60:06:01:60:9D:A0:2E:00:52:49:31:9F:C9:D7:E1:11 Image Condition: Primary Image Preferred SP: A Image UID: 50:06:01:60:C7:20:0A:2D Is Image Primary: NO Logical Unit UID: 60:06:01:60:B9:50:2E:00:FA:DE:7D:16:AB:F1:E1:11 Image State: Synchronized Image Condition: Normal Recovery Policy: Manual Preferred SP: A Synchronization Rate: Medium Image Faulted: NO Image Transitioning: NO Synchronizing Progress(%): 100), stderr()

54 54EMC CONFIDENTIAL—INTERNAL USE ONLY Log error / message examples sra_discoverDevices_ _ log :55:13,486 [com.emc.mirrorview.platform.snapshot.SnapviewSnapshotServiceImpl]: Retrieving SnapView snapshot information :55:13,486 [com.emc.mirrorview.platform.naviseccli.NaviseccliConnection]: Executing command: snapview -listsnapshots :55:13,861 [com.emc.mirrorview.platform.naviseccli.NaviseccliConnection]: Command result: stdout(This version of Core Software does not support Snapview), stderr() :55:13,861 [com.emc.mirrorview.platform.snapshot.HashMapSnapviewSnapshotRepository]: Caching SnapView snapshot with name null :55:13,861 [com.emc.sra.SraController]: Writing XML response... In the above example, I did not have the correct Snapview enabler installed on the VNX, there is a new one for INYO. Reference Slide 7

55 55EMC CONFIDENTIAL—INTERNAL USE ONLY Log error / message examples sra_testFailoverStart_ _ log :43:58,722 [com.emc.mirrorview.platform.snapshot.SnapviewSnapshotServiceImpl]: Retrieving SnapView snapshot information :43:58,722 [com.emc.mirrorview.platform.naviseccli.NaviseccliConnection]: Executing command: snapview -listsnapshots :43:59,923 [com.emc.mirrorview.platform.naviseccli.NaviseccliConnection]: Command result: stdout(SnapView logical unit name: syn c-0_SRM-TEST-FAILOVER SnapView logical unit ID: 60:06:01:60:54:50:2E:00:78:4E:71:B9:BA:F2:E1:11 Target Logical Unit: 0 State: Inactive), stderr() :43:59,923 [com.emc.mirrorview.platform.snapshot.HashMapSnapviewSnapshotRepository]: Caching SnapView snapshot with name sync- 0_SRM-TEST-FAIL OVER :43:59,923 [com.emc.mirrorview.platform.snapshot.SnapviewSnapshotServiceImpl]: Creating SnapView snapshot with name: sync-0_SRM-TEST- FAILOVER, of LUN: :43:59,923 [com.emc.mirrorview.platform.snapshot.SnapviewSnapshotServiceImpl]: Searching for current SP owner of lun: :43:59,923 [com.emc.mirrorview.platform.naviseccli.NaviseccliConnection]: Executing command: snapview -createsnapshot 0 - snapsho tname "sync-0_SRM-TEST-FAILOVER" :44:02,029 [com.emc.mirrorview.platform.naviseccli.NaviseccliConnection]: Command result: stdout(Error: snapview command failed A SnapView snapshot already exists with the specified name (0x b)), stderr() :44:02,029 [com.emc.mirrorview.platform.snapshot.session.SnapviewSessionServiceImpl]: Starting SnapView session with name: sync-0_SRM- TEST-FAILOVER_session, for snapshot: sync-0_SRM-TEST-FAILOVER :44:02,029 [com.emc.mirrorview.platform.naviseccli.NaviseccliConnection]: Executing command: snapview -startsession "sync- 0_SRM-TEST-FAILOVER_session" -snapshotname "sync-0_SRM-TEST-FAILOVER" -persistence :44:03,355 [com.emc.mirrorview.platform.naviseccli.NaviseccliConnection]: Command result: stdout(Error: snapview command failed You must add LUNs with adequate capacity to the Reserved LUN Pool before you can use this feature. (0x )), stderr() Both of the above errors were experienced when I did not have the Reserved Pool setup.

56 56EMC CONFIDENTIAL—INTERNAL USE ONLY Log error / message examples [ESX] From the esx side we will see errors in the vmkernel during a HBA rescan of the MirrorView devices: This is expected behaviour T13:11:19.187Z cpu9:2057)NMP: nmp_ThrottleLogForDevice:2318: Cmd 0x28 (0x ba440) to dev "naa b9502e00a4cc7fc81507e211" on path "v mhba5:C0:T0:L5" Failed: H:0x0 D:0x2 P:0x2 Possible sense data: 0x5 0x25 0x1.Act:NONE T13:11:19.187Z cpu9:2057)ScsiDeviceIO: 2316: Cmd(0x ba440) 0x28, CmdSN 0x5c5e6 to dev "naa b9502e00a4cc7fc81507e211" failed H:0x0 D:0x2 P:0x2 Possible sense data: 0x5 0x25 0x T13:11:19.187Z cpu14:2833)Partition: 484: Read of GPT header failed on "naa b9502e00a4cc7fc81507e211": I/O error T13:11:19.188Z cpu9:2057)ScsiDeviceIO: 2316: Cmd(0x ba440) 0x28, CmdSN 0x5c5e7 to dev "naa b9502e00a4cc7fc81507e211" failed H:0x0 D:0x2 P:0x2 Possible sense data: 0x5 0x25 0x T13:11:19.188Z cpu14:2833)WARNING: Partition: 944: Partition table read from device naa b9502e00a4cc7fc81507e211 failed: I/O error

57 57EMC CONFIDENTIAL—INTERNAL USE ONLY Log error / message examples [ESX] Taking a closer look at those scsi sense codes: H:0x0 D:0x2 P:0x2 Possible sense data: 0x5 0x25 0x1

58 58EMC CONFIDENTIAL—INTERNAL USE ONLY Questions

59


Download ppt "1EMC CONFIDENTIAL—INTERNAL USE ONLY VMware vCenter Site Recovery Manager 5.x with EMC VNX Arrays & MirrorView By Dave O’Sullivan"

Similar presentations


Ads by Google