Presentation is loading. Please wait.

Presentation is loading. Please wait.

Peter Mattei HP Storage Consultant 16. May 2013

Similar presentations


Presentation on theme: "Peter Mattei HP Storage Consultant 16. May 2013"— Presentation transcript:

1 Peter Mattei HP Storage Consultant 16. May 2013
Peer Persistence Peter Mattei HP Storage Consultant 16. May 2013

2 Peer Persistence Overview
Peer Persistence is a high availability (HA) configuration between two sites/data centers where the hosts are configured in a metro-cluster configuration. This is an active/passive configuration, which infers that host paths to primary and secondary volumes will be managed. The configuration supports VMware vSphere 5.0 hosts which must be connected to both the primary and secondary arrays. The vSphere hosts must be configured using 3PAR host persona 11. The WWNs of the volumes being replicated have to be the same. During a transparent failover, IO from the host to the primary array will be managed so as to allow a switchover operation from primary to secondary arrays. On completion of the switchover the host IO will be rejected by the primary array with a sense error indicating a change in the active path. Today the switchover is a manual process executed via the 3PAR CLI (the automated process is coming later this year)

3 Peer Persistence v1 – 3PAR Storage & VMware
High Availability Enhancements Data Center 2 HP 3PAR Up to 260km FC SAN vSphere A P vSphere Cluster Active LUN presentation Passive LUN presentation Transparently swap red LUN Transparently move red VM What does it provide? High Availability of VMware environments across data centers Manual transparent LUN swap between data centers Transparent VM vMotion between data centers How does it work? Based on 3PAR Remote Copy and vSphere ALUA host mode persona 11 Presents primary LUN as active and secondary as standby Remote Copy Group migration initiated by “setrcopygroup switchover <group>” command Supported environments: ESX vSphere 5.x Sync Remote Copy Up to RC supported max of 2.6ms RTT (~260km ) Requirements: 3PAR Disk Arrays 3PAR Remote Copy License 3PAR Peer Persistence License Data Center 1 HP Storage-based Replication

4 Peer Persistence Setup
Active Path Vol A Active Path (Vol B) Passive/Standby Path VMware Cluster LUN A.123 LUN B.456 LUN B.456 LUN A.123 Fabric B Each host is connected to each array on both sites via redundant fabric Each Volume is exported in R/W mode with same WWN from both arrays on both sites Volume paths for a given volume are “Active” only on the Array where the “Primary” copy of the Volume resides. Other volume paths are marked “Standby”. Synchronous copy of the volume is kept on the partner array/site. Fabric A 3PAR Array A 3PAR Array B Vol B (Secondary) Vol B (Primary) Vol A (Primary) Vol A (Secondary) Up to 2.6ms RTT latency Sync RC Site 1 Site 2

5 Various failure cases handled by this setup.
Automated server recovery with VMware HA Single component failure at the array level. Network link failure. Storage system failure. Site failure

6 Non-disruptive vMotion across sites
Active Path Vol A Active Path (Vol B) Passive/Standby Path VMware Cluster LUN A.123 LUN B.456 LUN B.456 LUN A.123 Fabric B VMs can be vMotion’ed across sites non-disruptively. Volume I/O continues to be served from “Active” paths, which could mean I/O is served from the remote array. If all VMs within a Datastore are moved to the other site, storage admin can choose to make the associated volume “Primary” on the other site. Fabric A 3PAR Array A 3PAR Array B Vol B (Secondary) Vol B (Primary) Vol A (Primary) Vol A (Secondary) Up to 2.6ms RTT latency Sync RC Site 1 Site 2

7 Automated Server Recovery with VMware HA
Active Path Vol A Active Path (Vol B) Passive/Standby Path VMware Cluster LUN A.123 LUN B.456 LUN B.456 LUN A.123 Fabric B Fabric A Upon failure of a VMware ESX server, VMware HA automatically restarts VMs on other ESX servers in the cluster (including on remote ESX servers) No intervention required on storage systems 3PAR Array A 3PAR Array B Vol B (Secondary) Vol B (Primary) Vol A (Primary) Vol A (Secondary) Up to 2.6ms RTT latency Sync RC Site 1 Site 2

8 Transparent Storage Component Failure
Active Path Vol A Active Path (Vol B) Passive/Standby Path VMware Cluster LUN A.123 LUN B.456 LUN B.456 LUN A.123 Fabric B Fabric A Single component failure (controller nodes or drive chassis) within the 3PAR array are handled locally and transparently Single Component Failure 3PAR Array A 3PAR Array B Vol B (Secondary) Vol B (Primary) Vol A (Primary) Vol A (Secondary) Up to 2.6ms RTT latency Sync RC Site 1 Site 2

9 Transparent Storage System Failure
Active Path Vol A Active Path (Vol B) Passive/Standby Path VMware Cluster LUN A.123 LUN B.456 LUN B.456 LUN A.123 Fabric B Upon failure of an entire storage system, the volumes that had the status of ‘primary’ on the failed have to be failed over to the other site by the administrator, i.e. become ‘primary’ on the other site. The ‘standby’ paths to these volumes become ‘active’ paths. Previously, ‘active’ paths to these volumes from the failed array go into ‘failed’ status. I/O to the host continues to be served and VMs remain online. Fabric A Storage System Failure 3PAR Array A # setrcopygroup failover 3PAR Array B Vol B (Secondary) Vol B (Primary) Vol A (Primary) Vol A (Primary) Up to 2.6ms RTT latency Sync RC Site 1 Site 2

10 Network Partition Handling
Active Path Vol A Active Path (Vol B) Passive/Standby Path VMware Cluster LUN A.123 LUN B.456 LUN B.456 LUN A.123 Network link failure Fabric B If and when, a network partition occurs, volumes from both arrays remain available to local hosts. Volume paths to remote hosts are lost. Passive volume paths remain passive. Volume synchronization is in the stopped state. VMs that reside on local Datastores stay online VMs that reside on remote Datastores shutdown Fabric A 3PAR Array A 3PAR Array B Vol B (Secondary) Vol B (Primary) Vol A (Primary) Vol A (Secondary) Up to 2.6ms RTT latency Sync RC Site 1 Site 2

11 Site Failure Site 1 Site 2 VMware Cluster Fabric B
Active Path Vol A Active Path (Vol B) Passive/Standby Path VMware Cluster Site Failure LUN A.123 LUN B.456 LUN B.456 LUN A.123 Fabric B Upon failure of an entire site, the user will have to failover the volumes that had the status of ‘primary’ on the failed site to the other site, i.e. become ‘primary’ on the other site. The ‘standby’ paths to these volumes become ‘active’ paths. Previously, ‘active’ paths to these volumes from the failed site go into ‘failed’ status. VMs on Site 2 remain online. VMs previously running on Site 1 can be restarted on Site 2 Fabric A 3PAR Array A # setrcopygroup failover 3PAR Array B Vol B (Secondary) Vol B (Primary) Vol A (Primary) Vol A (Primary) Up to 2.6ms RTT latency Sync RC Site 1 Site 2

12 Peer Persistence Operation in Detail
Transparent failover configuration with VMware ESX5.0 host cluster connected to both primary and secondary arrays. The primary volume is replicated to the secondary array using synchronous remote copy. The same WWN is used for both primary and secondary volumes. The primary and secondary volumes are exported using different Target Port Groups supported by persona 11 (VMware). SYNCHRONOUS REMOTE COPY HP 3PAR Storage Array (Secondary) (Primary ) ACTIVE PATH STANDBY PATH

13 Peer Persistence Operation in Detail
SYNCHRONOUS REMOTE COPY STOPPED HP 3PAR Storage Array (Secondary) (Primary ) ACTIVE PATH STANDBY PATH A remote copy group switchover command is executed on the primary array: setrcopygroup switchover <groupname> IO from the host to the primary array is blocked and in flight IO is allowed to drain. The remote copy group is stopped and snapshots are taken on the primary array.

14 Peer Persistence Operation in Detail
SYNCHRONOUS REMOTE COPY STOPPED HP 3PAR Storage Array (Secondary) (Primary ) TRANSITIONING The primary array target port group is changed to transition state. The primary array sends a remote failover request to the secondary array. The secondary array target port group is changed to transition state. The secondary array takes a recovery point snapshot.

15 Peer Persistence Operation in Detail
SYNCHRONOUS REMOTE COPY STOPPED HP 3PAR Storage Array (Pri-rev) (Primary ) TRANSITIONING ACTIVE PATH The secondary remote copy group changes state to become primary-reversed (pri-rev). At this point the pri-rev volume will become read/write. The pri-rev target port group is changed to active state. The pri-rev array returns a failover complete message to the primary array.

16 Peer Persistence Operation in Detail
SYNCHRONOUS REMOTE COPY STOPPED HP 3PAR Storage Array (Pri-rev) (Primary ) STANDBY PATH ACTIVE PATH The primary array target port group is changed to standby state and any blocked IO is returned to the host with a sense error: NOT READY,LOGICAL UNIT NOT ACCESSIBLE,TARGET PORT IN STANDBY STATE The host will perform SCSI inquiry requests to detect what target port groups have changed and which paths are now active. Host IO will now be serviced on the active path to the pri-rev array.

17 Peer Persistence Operation in Detail
SYNCHRONOUS REMOTE COPY REVERSED HP 3PAR Storage Array (Pri-rev) (Sec-rev) STANDBY PATH ACTIVE PATH The primary array will then send a remote recover request to the pri-rev array. The pri-rev array will perform a recover operation on the remote copy group. The primary array will become secondary-reverse. The remote copy group will be restarted from pri-rev to sec-rev.

18 Peer Persistence Operation in Detail
SYNCHRONOUS REMOTE COPY REVERSED HP 3PAR Storage Array (Primary) (Secondary) STANDBY PATH ACTIVE PATH When the pri-rev and sec-rev volumes are back in sync the snapshots that were taken on both sides will be removed. The remote copy group will then undergo a reverse (-natural) operation. This reverse request will update the remote copy request to change the pri-rev group to a primary group. The sec-rev group will become a secondary.

19 Peer Persistence Operation in Detail
SYNCHRONOUS REMOTE COPY REVERSED HP 3PAR Storage Array (Primary) (Secondary) STANDBY PATH ACTIVE PATH The system is now fully reversed and ready for another switchover request. The goal of the switchover is to perform the operations outlined above whilst managing the host traffic such that it does not timeout and that the volume migration happens transparently to the applications on the host. The target time to ensure the host traffic does not timeout is 30 seconds. This time does not include the subsequent recover and reverse operations.

20 Peer Persistence v2 – 3PAR Storage & VMware
To be released June 2013 with MU2 What does it provide? High Availability across data centers Automatic or manual transparent LUN swap Transparent VM vMotion between data centers How does it work? Based on 3PAR Remote Copy and vSphere ALUA Presents primary LUN as active and secondary as standby Automated LUN swap arbitrated by a Quorum Witness (QW Linux ESX VM on 3rd site) Supported environments: ESX vSphere 5.0 and 5.1 incl. HA, Failsafe and vSphere Metro Storage Cluster Sync Remote Copy Up to RC supported max of 2.6ms RTT (~260km ) Requirements: Two 3PAR Disk Arrays Two RC sync links (RCFC or RCIP) 3PAR Remote Copy License 3PAR Peer Persistence License vSphere vSphere Metro Storage Cluster DC 3 vSphere vSphere A Data Center 1 vSphere P FC SAN HP 3PAR Up to 260km P A HP Storage-based Replication HP 3PAR Data Center 2 host mode persona 11 must be configured on the 3PARs for these vSphere servers A Active LUN presentation Passive LUN presentation P

21 Peer Persistence – 3PAR Storage & VMware
Never lose access to your volumes VMware vSphere 5.x Metro Storage Cluster (single subnet) What does it provide? High Availability of VMware environments across metro data centers Automated & Manual transparent LUN swap Transparent VM vMotion between data centers How does it work? Based on 3PAR Remote Copy and vSphere ALUA host mode persona 11 Presents primary LUN as active and secondary as standby Automated LUN swap arbitrated by a Quorum Witness (QW VM on 3rd site) Supported environments: ESX vSphere 5.x incl. HA, Failsafe and vSphere Metro Storage Cluster Sync Remote Copy Up to RC supported max of 2.6ms RTT (~260km ) Requirements: 3PAR Disk Arrays 3PAR Remote Copy License 3PAR Peer Persistence License Fabric A Fabric B Up to 2.6ms RTT latency Vol B sec Vol B prim Vol A prim Vol A sec vSphere Peer Persistence v2 will introduce an arbitration instance (preferably located in a 3rd location) providing manual & fully automated failover capabilities. The solution will be certified by VMware as vSphere Metro Storage Cluster QW 3PAR Array Site A 3PAR Array Site B Site C

22 Thank you


Download ppt "Peter Mattei HP Storage Consultant 16. May 2013"

Similar presentations


Ads by Google