Stretching an Oracle DB Across Sites with EMC VPLEX

Stretching an Oracle DB Across Sites with EMC VPLEX
Matthew Kaberlein Application Practice - Oracle Solutions

Discussion Flow A Few Points To Remember From Discussion
VPLEX Overview & Use Cases Stretched Oracle RAC With EMC VPLEX Blueprint RAC-on-VPLEX I/O types and flow VPLEX Configuration Options To Note RAC-on-VPLEX Guidelines & Considerations VPLEX & RAC/VPLEX Component Failure Scenarios

Access the Community to get WPs on topic today

A Few Points To Remember
Using VPLEX with an Oracle DB you can… Live Migrate a running DB from Site 1 to Site 2 (vice versa) using OVM or VMware Non-disruptively migrate a running DB from one Storage Array to another Deploy an Active-Active Stretch RAC implementation across 2 sites RAC Interconnect & FC Storage connectivity must be within synchronous distance & less than 5ms RTD Stretch RAC with VPLEX is a normal Oracle RAC install No special Pre-Install Tasks, Grid Infrastructure Install, RAC & DB Install steps No ASM Failure Group setup & No Host I/O mirroring No special network configuration for DBA No Voting disk in 3rd site

VPLEX Overview What is VPLEX? What it Does?

What is VPLEX Enterprise class storage virtualization solution
Hardware & software Cluster  Enables scalability & availability Aggregates & manages heterogeneous pools of SAN FC attached storage arrays, within & across data centers Virtualizes HP, Hitachi, IBM, Oracle, NetApp, EMC, … storage arrays Hosts see Storage arrays as a single storage array VPLEX Cluster sits in the SAN between Hosts & FC storage arrays VPLEX is the storage array to the DB servers Designed to handle massive amount of I/Os  IOPS or MBs Hardware has no single point of failure  Redundant everything VPLEX AccessAnywhere clustering software  Allows read/write access to distributed volumes within & across data centers

VPLEX Hardware – Engine / Director
1 Engine = 2 Directors (picture below) Non-disruptive hardware/software upgrades 8000 LUNs (increasing next release Q1’ 2014) 64GB Cache per Engine Over 10TB of “data licensing” per Engine 64GB Cache per Engine 8Gbs FC connectivity

VPLEX Software AccessAnywhere
Enables R/W storage virtualization over distance Directory-based distributed cache coherence Efficiently maintains cache state consistency across all Engines & Sites Engine Cache Coherency Directory Block Address 1 2 3 4 5 6 7 8 9 10 11 12 13 … Cache A Cache C Cache E Cache G Engine Cache Coherency Directory Block Address 1 2 3 4 5 6 7 8 9 10 11 12 13 … Cache A Cache C Cache E Cache G New Host Write: Block 3 Host Read: Block 3 Efficiently maintains cache state consistency across all Engines & Sites -All Engines in a cluster have access to all LUNs, so a DB block could be in the cache of any of the engines in a single site. So, the directory knows what engine has the block to service an I/O * -If a VPLEX site goes “down”, surviving VPLEX goes in to marking mode to track changes. When “down” VPLEX comes up, directory knows where the most current version of block is, and routes I/O request to local or remote VPLEX. VPLEX adds about 300 microseconds to I/O Cache Directory B Cache Directory D Cache Directory F Cache Directory H Cache Directory A Cache Directory C Cache Directory E Cache Directory G Cache Cache Cache Cache

VPLEX Deployment Information
Over 800 PB deployed Over 2800 clusters deployed Largest deployment to date – 16PB 20 MILLION + RUN HOURS 40+ SUPPORTED PLATFORMS ~6 - 9’s SYSTEM UPTIME

What Are Some Use Cases With An Oracle DB?

Some VPLEX Use Cases Mobility Continuous Availability
Non-disruptive storage infrastructure upgrades To new storage array To Tier-1 storage array Migrate live & running VMs between Data Centers Continuous Availability Stretching a RAC DB deployment across 2 sites

Oracle DB Storage Array Migration
No Application downtime on storage array refreshes No ASM host based migration (ie. Add/Drop/rbal disks) Enables storage optimization by enabling restructuring of data across storage arrays Data center migrations can be done as an on-line event AOL - Migrated 48 arrays in 52 weeks from May May 2011 – an average of 1 array every 5.4 bus. days Zero impact to host Applications when moving data LUN 1v LUN 1v Distributed Virtual Volume Dev A Dev B Dev C Data Mobility Data Mobility Storage Migration Datacenter Migration Array 1 Array 2 Array 3 Data Center 1 Data Center 2

OVM/VMW Live Migration between 2 sites
Relocates DB instance VM to another Server in Cluster & DB Data to another Storage array, both in another Data Center OVM/VMW migrates DB instance, VPLEX migrates DB data Enabled by: Sync distance, 5ms RTD for Storage & VM IP network, Shared Virtual disks to both Servers, Layer 2 network stretched Storage array Supported on OVM v2.2.0+, RH Linux, OEL and Windows. To deploy, must be a: VPLEX Metro config, 5ms RTD, extending layer 2 network Storage is presented from both storage arrays(site 1 & site 2) to VPLEX. Single set of VPLEX devices presented to both Source & Target VM servers in the cluster.

Stretching a RAC over Metro distance
RAC nodes dispersed across 2 sites, Reading & Writing to a single logical DB VPLEX virtualized storage used to simulate a single, local DB and RAC configuration Site 1 Oracle RAC Nodes 1 & 2 Site 2 Oracle RAC Nodes 3 & 4 RAC Interconnect DISTRIBUTED VIRTUAL VOLUMES Oracle Homes on VPLEX volumes Oracle ASM disks on VPLEX volumes  +DATA, +FRA, +REDO

Stretch RAC-VPLEX What does it look like to us DBAs?

RAC/VPLEX Metro Reference Architecture – DBA View “a typical CRS-RAC install/config”
Still must use a single subnet for Interconnect & Public networks: Requires layer 2 of network extended across 2 sites Brocade VCS/LAG, Cisco OTV Look & feel of local RAC Oracle Universal Installer is not aware of different subnets DB Services still function properly SCAN VIP still functions properly, both sites are used & load balanced Still use non-routable IP address for Interconnect Site 1 Oracle RAC Nodes 1 & 2 Site 2 Oracle RAC Nodes 3 & 4 RAC Interconnect DISTRIBUTED VIRTUAL VOLUMES (like RAID-1) Layer 2 – data link layer, is responsible for controlling errors between network nodes and transferring the “frames” to the other computers & to connect hosts within the same network, via the physical layer/Layer 1. Layer 2 is used by hubs & switches for their operations Data link protocol, example: Ethernet. Layer 3 – network layer transfers data sequences from a source host on one network to a destination host on a different network. Routers operate at this layer sending data thru the extended network (ie. Internet or corporate network). Subnet is a logically visible and accessible subdivision of an IP network. One of the benefits is enhanced routing efficiency for a RAC deployment. +GRID ASM disk group: When using ASM to house the Clusterware files (OCR & Voting disk files) (starting with 11gR2), EMC recommends creating a +GRID disk group. This enables one to use Storage array based cloning and Storage array remote replication to create copies of just the DB and mount the DB on another config’d cluster. One set of Oracle Clusterware on VPLEX volumes  (+GRID) One set of Oracle ASM DB disks on VPLEX volumes  +DATA, +FRA, +REDO 4 sets of Oracle S/W & confg files on VPLEX volumes

RAC-VPLEX What is a more detailed view?

RAC/VPLEX Metro Reference Architecture
Site 1 Oracle RAC Nodes 1 & 2 Site 2 Oracle RAC Nodes 3 & 4 RAC Interconnect Oracle S/W & confg files on VPLEX volumes “Identical” virtual volumes (actually the same volumes) DISTRIBUTED VIRTUAL VOLUMES (ie. ASM Disks) Talk about carving up ASM disks on both storage arrays and pulling in to a DVV, so seen as 1 ASM disk/LUN. Storage Network between sites is FC based. Management network between sites if IP based. Storage Network Oracle Clusterware on VPLEX volumes (+GRID) VPLEX interconnect (dark fibre) Oracle ASM DB on VPLEX volumes VPLEX Cluster-1 VPLEX Cluster-2 Dedicated IP link Dedicated IP link VPLEX Cluster Witness Physical volumes Physical volumes Storage Array 1 Site 3 Storage Array 2

Oracle DB to VPLEX to Storage Array I/O types Assumes storage array is an intelligent cache based platform

VPLEX Cache Used - Read Hit
DB Server Data found in VPLEX Cache: 1) Read request sent to VPLEX 2) Read instantly acknowledged via VPLEX & data sent back to host Oracle Instance 2) 1) VPLEX Storage Array For example, direct path writes could be cached… When a read request comes in, VPLEX automatically checks the directory for an owner. Once the owner is located, the read request goes directly to that engine. Once a write is done and the table is modified, if another read request comes in from another engine, it checks the table and will then pull the read directly from that engine’s cache. If it’s still in cache, there is no need to go to the disk to satisfy the read. Cache Cache M 1 M 2

VPLEX Cache Not Used – Short Read Miss
Data not found in VPLEX Cache: 1) Read request sent to VPLEX 2) Data not found in VPLEX cache 3) Read request sent to Storage 4) Data found in Storage cache & piped in to VPLEX cache 5) Data in VPLEX cache sent to host DB Server Oracle Instance 5) 1) VPLEX Storage Array 3) 2) Cache Cache 4) M 1 M 2

VPLEX Cache Not Used – Long Read Miss
Data not found in VPLEX & Storage Cache: 1) Read request sent to VPLEX 2) Data not found in VPLEX cache 3) Read request sent to Storage 4) Data not found in Storage cache 5) Data read from disk & piped in to Storage cache 6) Data in Storage cache sent to VPLEX cache 7) Data in VPLEX cache sent to host DB Server Oracle Instance 7) 1) VPLEX Storage Array 3) 2) 4) Cache Cache 6) 5) M 1 M 2

VPLEX - Write-Through Cache
1) Write request sent to VPLEX-1 2) VPLEX-1 sends write request to Storage-1 Cache& VPLEX-2 3) VPLEX-2 sends write request to Storage-2 Cache 4) Write acknowledgment sent back to VPLEX-1 from Storage 1 5) Write acknowledgement sent back to VPLEX-2 from Storage2 6) Write acknowledgement sent back to VPLEX-1 from VPLEX-2 7) When VPLEX-1 receives acknowledgement from both Storage 1 and VPLEX-2, the Write is acknowledgement sent back to host 8) Data de-staged to disk later, from Caches on both Storage-1 & 2 DB Server Oracle Instance 7) 1) VPLEX-1 VPLEX-2 6) Cache Cache 2) 4) 5) 2) Storage-1 3) Storage-2 Cache Cache 8) 8) M 1 M 1 M 2 M 2

Why implement a Stretched RAC with VPLEX?
Besides having a Business Case…

Value of VPLEX with Oracle RAC
Availability Failover and failback eliminated No Single Point of Failure Add Engines & RAC nodes non-disruptively Data is accessed locally VPLEX Witness is 3rd site arbiter Scalability VPLEX devices upgraded dynamically Scale RAC as normally would do Cache & performance attributes scale with number of engines Storage capacities in excess of 1 PB fully supported AVAILABILITY -VPLEX is a fully redundant architecture, eliminating failover and failback in the event of a storage director or other component failure -VPLEX-based Oracle RAC has no Single Point of Failure -VPLEX engines & RAC nodes can be added non-disruptively, and increase availability -Data is accessed locally with VPLEX, so a remote site failure does not cause an application loss/restart -VPLEX Witness is an integrated 3rd site arbiter which provides intelligent guidance to each cluster regarding link and site failures PERFORMANCE Hosts connect to local VPLEX cluster, all reads are local reads, and leverages caches from VPLEX & Storage No read latency due to inter-site access or distance All writes are synchronously mirrored between sites by VPLEX VPLEX scales to meet performance requirements by adding engines (1x  2x  4x) non-disruptively Scaling adds IOPS and 2x incremental cache No host CPU cycles used to mirror data to both locations Reduced strain on CPU, RAM, HBA, SAN No ASM failure groups needed & I/O consistency issues SCALABILITY VPLEX devices can be upgraded dynamically No application downtime Scale your RAC like you would in a single site deployment Still need to consider the Application’s ability to scale across additional RAC nodes Cache and performance attributes scale with number of engines Like growing the SGA to meet workload needs Storage capacities in excess of 1 PB are fully supported, so VLDBs are no issue DBA SIMPLICITY Less ASM (or equivalent) administration No replication of data needs to be done by ASM or host All done by VPLEX in the background No scripting or 3rd party tools are required to execute arbitrated failover No need to deploy Voting disk in 3rd site DB sessions still have automatic failover using TAF Disaster recovery validation & testing is a natural extension of the implementation Non-disruptive mobility across storage frames to simplify tech refresh or enhance dynamic storage environments Performance All reads are local & leverages caches in VPLEX & Storage VPLEX scales to meet performance requirements No host CPU cycles used to mirror data to both locations DBA Simplicity No ASM I/O mirroring administration No 3rd site Voting disk to execute arbitrated failover DB sessions can still use TAF DR validation natural to architecture Non-disruptive data mobility across storage frames

EMC VPLEX Configuration Options

RAC/VPLEX Metro Reference Architecture
Site 1 Oracle RAC Nodes 1 & 2 Site 2 Oracle RAC Nodes 3 & 4 RAC Interconnect Oracle S/W & confg files on VPLEX volumes “Identical” virtual volumes (actually the same volumes) DISTRIBUTED VIRTUAL VOLUMES (ie. ASM Disks) Storage Network Oracle Clusterware on VPLEX volumes (+GRID) VPLEX interconnect (dark fibre) Oracle ASM DB on VPLEX volumes VPLEX Cluster-1 VPLEX Cluster-2 Dedicated IP link Dedicated IP link VPLEX Cluster Witness Physical volumes Physical volumes Storage Array 1 Storage Array 2 Site 3

VPLEX Metro Cluster Configuration Options…
VPLEX Witness  VPLEX Cluster Arbiter Used to improve Application availability in the event of a site failure, VPLEX cluster failure and/or Inter-cluster communication failure Same approach as CRS Voting Disk in a 3rd location Connects to both VPLEX Clusters over IP network Up to 1 second RTD(round trip delivery) Monitors each VPLEX Cluster & VPLEX Cluster interconnect Deployed as a VM on a host in a 3rd site, in a failure domain outside the VPLEX clusters Use VMware DRS on physical host failure to failover VM Use VMware Fault Tolerance to ensure VM always up

…VPLEX Metro Cluster Configuration Options
VPLEX Detach Rules Predefined rules for Storage devices supporting a DB, that identifies which VPLEX cluster should detach its mirror leg, on VPLEX network communication failure, to allow the surviving VPLEX cluster to continue processing I/O requests Effectively, this defines a Preferred Site for storage devices To Avoid a VPLEX cluster split-brain situation And force The non-preferred site to suspend processing I/O requests to maintain data integrity for the device For 2 Apps/DBs… All the devices that support each App/DB can have a different Preferred site. So, in case of VPLEX spilt brain scenario, each DB could still “only” be running on its Preferred site.

RAC-on-VPLEX -- Guidelines & Considerations

Guidelines & Considerations…
No need to deploy a Voting disk in 3rd site Maintained on each storage array in each location Use ASM to provide balanced I/O & capacity across front end LUNs (ASM disks) If using ASM today…. Use ASM External Redundancy for +DATA, +REDO, +FRA, etc… Create +GRID ASM disk group for CRS in 11gR2+ Contains OCR & Voting disk files Use ASM Normal or High Redundancy to create multiple copies Useful when using: Storage array local replication to create DB Clones to mount to another CRS stack Storage array remote replication for DR replication to mount DB to an already configured CRS stack at DR site

…Guidelines & Considerations
Use Virtual Provisioning for the DB for wide striping and balanced Capacity & I/O across storage pool Use 2 physical HBAs in each RAC node (VM node) Each HBA port connected to front-end ports on different VPLEX Directors of an Engine Use I/O Multipathing software (such as PowerPath) to enable multiple & balanced I/O paths to storage infrastructure Storage array Clones & Snaps still work, as well as Remote Replication for DR

Component Failure Scenarios

On the next slide, where 2 or more lightning bolts,
we assume simultaneous failures of multiple VPLEX components (rare) VPLEX component failure sequence & how it responds to failures is critical for a VPLEX implementation.

VPLEX Component Failure Scenarios
W C1 C2 W C1 C2 W C1 C2 Assuming C1 has bias C1 Continues I/O, C2 Suspends & Issues Dial Home C1 and C2 Continue I/O & Issue Dial Home C1 Continues I/O & C2 Down & Issues Dial Home C1 = VPLEX Cluster 1. C2 = VPLEX Cluster 2. W = VPLEX Witness. ******************************************************** Failure Scenario #4 & #5: C1 senses isolation from Witness and C2, so it goes in to a Suspended state. If C2 went down 1st and Witness went down 2nd, C1 would survive and continue servicing I/Os since it would received “guidance” from the Witness on the failure status. ********************* Failure Scenario #12: C1 senses isolation from Witness and C2, so C1 & C2 go in to a Suspended state. If W went down 1st and C2 second, C1 & C2 would still go in to a Suspended state since no guidance from W. If C2 went down 1st and W went down 2nd, C1 would continue servicing I/Os since C1 would receive guidance from W, like scenario #6. Failure Scenario #13: C2 senses isolation from Witness and C2 since they are down & not getting guidance from W, so goes in to a Suspended state. Failure Scenario #13: If C1 goes down & then 30 seconds later link between W and C2 go down, C2 will still service I/Os since it would received “guidance” from the Witness before the link went down. W C1 C2 W C1 C2 W C1 C2 C1 and C2 Continue I/O C2 Issues Dial Home C1 and C2 Continue I/O And Issue Dial Home C1 Continues I/O, C2 Suspends I/O & Issues Dial Home

RAC & VPLEX Component Failure Scenarios
Oracle Extended RAC with VPLEX Metro (Both) RAC & VPLEX Inter-connects failed VPLEX will re-configure before RAC Via Detach Rules, Non-preferred site VPLEX cluster & therefore storage will come down RAC nodes attached to preferred site VPLEX/storage will continue processing I/O RAC nodes on non-preferred VPLEX come down Loss of connectivity to 1 Storage array from VPLEX (ie. VPLEX cluster still alive, Storage array has failed) VPLEX continues to serve I/Os at both sites, even if one of the storage arrays is not available Oracle Clusterware would NOT know about storage unavailability as VPLEX cluster continues to service all I/Os When access to Storage array is restored, storage volumes from failed array will automatically & incrementally resynchronize & I/Os continue locally

In Summary Using VPLEX with an Oracle DB you can…
Live Migrate a running DB from Site 1 to Site 2 (vice versa) using OVM or VMware Non-disruptively migrate a running DB from one Storage Array to another Deploy an Active-Active Stretch RAC implementation across 2 sites RAC Interconnect & FC Storage connectivity must be within synchronous distance & less than 5ms RTD Stretch RAC with VPLEX is a normal Oracle RAC install No special Pre-Install Tasks, Grid Infrastructure Install, RAC & DB Install steps No ASM Failure Group setup & No Host I/O mirroring No special network configuration for DBA No Voting disk in 3rd site

Appendix: Will not review in OUG meetings

Additional Failure Scenarios

…RAC & VPLEX Component Failure Scenarios
Oracle Extended RAC with VPLEX Metro Lab/building/site By installing VPLEX Clusters and Witness in independent failure domains (such as another building or site) the deployment is resilient to lab, building, or site failures The VPLEX cluster not affected by the failure will continue to serve I/Os to the App/DB Use TAF to allow sessions to automatically failover to a surviving RAC node Database instance crash or public network disconnect RAC provides database resiliency for a single server failure by performing automatic instance recovery and having other RAC nodes ready for user connections Use TAF to allow sessions to automatically failover to a surviving RAC node

Oracle Extended RAC with VPLEX Metro RAC inter-connect failure RAC natively & automatically handles, via cluster reconfiguration Storage Front-end port Host to Storage connectivity should include multiple storage front-end ports, ideally across Symmetrix directors If using Symmetrix with multiple engines connect to ports on different engines as well to gain even higher protection VPLEX Front-end port VPLEX will continue to provide access to the storage volumes via alternate paths to the VPLEX cluster If using multiple VPLEX engines, connect to ports on different engines to gain high protection Use I/O multipathing software (such as PowerPath) for automatic path failover and load balancing

Oracle Extended RAC with VPLEX Metro VPLEX Back-end port VPLEX will continue to provide access to the storage volumes via alternate back-end paths to the storage array Use multipathing software (such as PowerPath) for automatic path failover and load balancing VPLEX hardware component VPLEX components are fully redundant, including persistent cache (uses vaulting in the case of elongated power failure), redundant directors and power supplies VPLEX Cluster inter-connect failure If both sites are still available, the VPLEX Preferred Cluster detach rules will determine which cluster resumes I/Os and which suspends, without downtime for host(s) connected to the surviving Cluster Hosts connected to non-preferred site will have suspended I/Os and have disk timeout

Oracle Extended RAC with VPLEX Metro Single VPLEX Cluster VPLEX Witness will allow I/O to resume at the surviving VPLEX cluster. RAC cluster nodes connected to that VPLEX cluster will resume operations without downtime Use TAF to allow automatic user connection failover to the surviving RAC nodes HBA port, Host hardware, Storage array components, Physical disk drive There is no impact to RAC/VPLEX config since there are multiple active-active components

Some approaches to place host LUNs(ASM disks) in to VPLEX

Encapsulation Method (Short outage to cutover)
Host is shutdown VPLEX “claims” the volumes that were presented to the host Claimed Volumes are configured as Devices on the VPLEX and presented back to the host Hosts 1 3 VPLEX 2 Donor Array

Host Migration Method (Online, no outage)
New VPLEX Target Volumes are presented to the Hosts via the SAN Host Migration to VPLEX VMware Storage VMotion AIX LVM Mirroring Solaris/ Veritas LVM Mirroring ASM (Add/Drop disks) Once Migration is complete, the Donor Mirror is broken and migration to the VPLEX is complete Hosts 1 VPLEX 2 Donor Array

Stretching an Oracle DB Across Sites with EMC VPLEX

Similar presentations

Presentation on theme: "Stretching an Oracle DB Across Sites with EMC VPLEX"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Stretching an Oracle DB Across Sites with EMC VPLEX

Similar presentations

Presentation on theme: "Stretching an Oracle DB Across Sites with EMC VPLEX"— Presentation transcript:

Similar presentations

About project

Feedback