Copyright © 2009 Rolta International, Inc., All Rights Reserved Oracle High Availability - A Case Study Rama Balaji Senior Oracle Consultant.

This presentation is a case study of a customer whose storage array failed during heavy transaction processing period. Methods followed to quickly restore the database. How the best practice configuration was arrived while maintaining at least two copies of production databases on-line. How Oracle’s HA components were effectively explored to achieve the final configuration with only few minutes of production downtime. Overview

Agenda Background Approach Initial Configuration Disaster to Full recovery High Availability Features Conclusion and best practices Question and Answer

RMAN compressed backups Physical Standby using Data Guard ASM to non-ASM and vice versa Cascaded Destinations Standby 2 node RAC Failover and Switchover Flash Recovery Area and Flash Back Logs Oracle Features Used

Background E-commerce client Nature of business Initial cost effective business approach Business growth IT infrastructure

Approach 1. Storage array failure 2. Restored database from tape using RMAN 3. Set up the Physical Standby 4. Second Physical Standby using cascaded destination feature 5. Failover from production to standby EventDatabase Availability No Database One Two Three Two

Approach 6. 2-Node RAC with ASM on the new storage array as standby 7.Switched over from single node Db instance to 2 Node RAC as primary 8.Disconnected the second standby 9. 2 node RAC standby to 2 node RAC primary 10. Flashback Logs cleanup from FRA Event Database Availability Three Two No Downtime

Storage Array 2 node RAC + ASM Initial configuration Server - Linux Red Hat x86_64 Database – Oracle 10.2.0.3 ASM – two diskgroups DATA_DG – Database FLASH_DG – flash_recovery_area Daily full database backup to disk- RMAN, and to tape using third party software Hourly archivelog backups to disk – RMAN, and to tape using third party software No RMAN catalog

RMAN RMAN> show controlfile autobackup; RMAN configuration parameters are: CONFIGURE CONTROLFILE AUTOBACKUP ON; RMAN> show controlfile autobackup format; RMAN configuration parameters are: CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '/opt/oracle/admin/PROD/backup/cntl/%F ';

RUN { CONFIGURE RETENTION POLICY TO REDUNDANCY 1; ALLOCATE CHANNEL ch1 DEVICE TYPE DISK MAXPIECESIZE 2G FORMAT '/opt/oracle/admin/PROD/backup/db/RCOMPRESSE D_%U'; BACKUP as compressed backupset DATABASE plus archivelog channel ch1; } RMAN Backup script

RUN { ALLOCATE CHANNEL ch1 DEVICE TYPE DISK FORMAT '/opt/oracle/admin/PROD/backup/archive/ARCH_%U' ; BACKUP as compressed backupset archivelog all; } RMAN Archivelog backup script

Storage Array Failure Storage Array 2 node RAC + ASM Storage admin determined that the SAN failure has caused loss of entire data including the database and backups. Validated that the RMAN backup on tape from previous night as well as archivelog files from previous hour were intact.

Restore using RMAN Storage Array 2 node RAC + ASMSingle DB Non- ASM Restored the tape backup and all archivelog file backups from tape to internal disks on one of the RAC node

Restore from ASM to non- ASM Restored the backup from tape to the original backup location. Restored spfile from autobackup. RMAN> restore spfile from autobackup; Created pfile from spfile;

Edited the following parameters in pfile  control_files - Changed +DATA_DG and +FLASH_DG to file system  db_file_name_convert - Changed +DATA_DG to file system  log_file_name_convert - Changed +DATA_DG and +FLASH_DG to file system startup nomount using pfile; Using RMAN nocatalog, restore controlfile from autobackup. RMAN> restore controlfile from ‘/opt/oracle/admin/PROD/backup/cntl/c-662196802- 20081230-0e ’ Restore from ASM to non-ASM (Continued…)

alter database mount; restore database; recover database; Created temporary tablespaces alter database open ; Restore from ASM to non- ASM (Continued…)

Physical Standby Standby Database Primary Database PRODPRODS

Physical Standby Steps On the primary database (PROD) generated pfile. Create pfile=’/tmp/initPROD.ora’ from spfile; and copy that file to the standby server. On the primary (PROD) backed up the current controlfile using RMAN. RMAN> backup current controlfile for standby; RMAN> copy current controlfile for standby to '/tmp/sby_control01.ctl';

Run a full RMAN backup on the primary (PROD). RMAN> RUN { ALLOCATE CHANNEL ch1 DEVICE TYPE DISK MAXPIECESIZE 2G FORMAT '/home/oracle/backup_standby/RCOMPRESSED_% U'; BACKUP as compressed backupset DATABASE plus archivelog channel ch1; } Copied all backup files as well as controlfile backup to the standby (PRODS). Backup files needs to be in the same location as the primary backup location or create a symbolic link. Physical Standby Steps (Continued…)

Edited the parameter file on the standby (PRODS).  Changed control_files parameter to point to the controlfile on the standby server.  Changed db_file_name_convert to use the new location on the standby server.  Changed log_file_name_convert to use the new location on the standby server. Start up the standby (PRODS)instance in nomount Physical Standby Steps (Continued…)

rman nocatalog RMAN> connect target username/password@PROD RMAN> connect auxiliary / RMAN> RUN { ALLOCATE auxiliary CHANNEL ch1 DEVICE TYPE DISK FORMAT '/home/oracle/backup_standby/RCOMPRESSED_% U'; duplicate target database for standby; } Changed the following parameter on the primary (PROD). SQL> ALTER SYSTEM SET log_archive_dest_2= 'SERVICE=PRODS_XPT REOPEN=300' Physical Standby Steps (Continued…)

Changed the standby (PRODS) to Managed Recovery alter database recover managed standby database disconnect from session; Issued the following command to make sure the archive log files are applied on the standby (PRODS) Select sequence#, applied from v$archived_log order by sequence#; Physical Standby Steps (Continued…)

Data Guard Cascaded Destinations LOG_ARCHIVE_DEST_2 ='SERVICE=PRODS_XPT REOPEN=300'; Primary Database Standby Database 1 Data Guard Data Guard Cascaded Destination Standby Database 2 LOG_ARCHIVE_DEST_2 ='SERVICE=PRODC_XPT REOPEN=300'; PRODPRODSPRODC

Data Guard Failover Primary Database Data Guard Failover Primary Database Data Guard Standby Database PRODPRODSPRODC

Data Guard Failover Steps On the primary (PROD)I did the following. After users were logged out, created a table called TEST. Create table test as select * from dba_users; This was a last operation on the primary database. On the primary (PROD) issued the following command couple of times. Alter system switch logfile ;

On the primary (PROD), noted down the sequence# SQL> archive log list Database log mode Archive Mode Automatic archival Enabled Archive destination /d01/oracle/flashback Oldest online log sequence 6503 Next log sequence to archive 6504 Current log sequence 6504 On the standby (PRODS) issued the following command to make sure standby is in maximum performance mode. SQL> alter database set standby database to maximize performance; Data Guard Failover Steps (Continued…)

On the standby(PRODS), issued the following command SQL> select thread#, low_sequence#, high_sequence# from v$archive_gap; no rows selected On the standby (PRODS), check the last sequence#, SQL> select max(sequence#) from v$archived_log; MAX(SEQUENCE#) ------------- 6504 Data Guard Failover Steps (Continued…)

On the standby (PRODS), SQL> alter database recover managed standby database finish force; Database altered On the standby (PRODS), SQL> alter database commit to switchover to primary; Database altered On the standby instance (PRODS), SQL> shutdown immediate ORA-01507: database not mounted ORACLE instance shut down. Data Guard Failover Steps (Continued…)

SQL> startup ORACLE instance started. Total System Global Area 2147483648 bytes Fixed Size 2074152 bytes Variable Size 1056967128 bytes Database Buffers 1073741824 bytes Redo Buffers 14700544 bytes Database mounted. Database opened. At this point standby is opened as primary. Data Guard Failover Steps (Continued…)

On the current primary (PRODS), issued the following query to make sure the table was brought over. Select * from test; then drop the table. On the second standby (PRODC) instance verified the cascaded standby destination by querying v$archived_log. I had to bounce the cascaded standby instance (PRODC), and put it back in a recovery mode. Data Guard Failover Steps (Continued…)

2-Node RAC as Standby PROD1 Storage Array 2 node RAC + ASM Standby Database 2 Primary Database Data Guard Standby Database 1 PRODS PRODC PROD2

Data Guard Switchover Steps Make sure only one instance on the RAC system is mounted(PROD1). Make sure the FAL_SERVER and FAL_CLIENT parameters are set correctly on the primary as well as standby. On the primary FAL_SERVER=standby instance FAL_CLIENT=primary instance On the Physical Standby FAL_SERVER=primary instance FAL_CLIENT=standby instance

On the primary (PRODS) issued the following SQL> select switchover_status from v$database; SWITCHOVER_STATUS -------------------------------- TO STANDBY On the standby (PROD1) instance issued the following command SQL> select switchover_status from v$database; SWITCHOVER_STATUS -------------------------------- TO PRIMARY Data Guard Switchover Steps (Continued…)

On the primary (PRODS) issued the following command SQL> ALTER DATABASE COMMIT TO SWITCHOVER TO PHYSICAL STANDBY; SQL> ALTER SYSTEM SET log_archive_dest_state_2='DEFER'; Shutdown the primary instance (PRODS) SQL> shutdown immediate; SQL> Startup nomount; SQL> alter database mount standby database; Data Guard Switchover Steps (Continued…)

On the standby (PROD1) issued the following command SQL> ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY; On the standby (PROD1) SQL>shutdown immediate; SQL> startup open; SQL>alter system switch logfile; Make sure other instance(PROD2) comes up as well. Data Guard Switchover Steps (Continued…)

Data Guard Switchover Storage Array 2 node RAC + ASM Primary Database 1 Standby Database 1 Data Guard PRODPRODS Standby Database 2 PRODC

Storage Array 2 node RAC + ASM Primary Database Storage Array 2 node RAC + ASM Standby Database Final Configuration

Real Application Clusters RAC misconception. Always register the database and instances to the cluster. You have to use “netca” to register the listener to crs in 10g. Srvctl is “case sensitive”. Always use single parameter file for multiple RAC nodes. Client side load balancing. Server side load balancing.

ASM Configuration LUN 1LUN 2 LUN 3LUN 4 LUN 5LUN 6 LUN 7 LUN 8 Storage Group 1 Storage Group 2 DATAFLASH ASM Disk Groups Data Files Control Files Online log files Archive log files RMAN backups Mirrored cntl and log Flashback Logs ASM disks

Flash Recovery Area Show parameter db_recovery NAME TYPE VALUE ------------------------------------ ----------- --------------------------- db_recovery_file_dest string +FLASH_DG db_recovery_file_dest_size big integer 190000M SQL> select space_used/(1024*1024*1024),space_limit/(1024*1024 *1024) from v$recovery_file_dest; SPACE_USED (in GB) SPACE_LIMIT (in GB) --------------------------- ---------------------------- 55.6249833 185.546875

SQL> select * from v$flash_recovery_area_usage; CONTROLFILE 0 0 0 ONLINELOG 0 0 0 ARCHIVELOG 18.17 6.03 4691 BACKUPPIECE 0 0 0 IMAGECOPY 0 0 0 FLASHBACKLOG 11.81 11.73 1077 ORA-19815: WARNING: db_recovery_file_dest_size of 53687091200 bytes is 87.63% used, and has 6642196992 remaining bytes available Flash Recovery Area

SQL> alter system set db_recovery_file_dest_size=55G scope=memory; SQL> select * from v$flash_recovery_area_usage; CONTROLFILE 0 0 0 ONLINELOG 0 0 0 ARCHIVELOG 61.3 20.34 4692 BACKUPPIECE 0 0 0 IMAGECOPY 0 0 0 FLASHBACKLOG 38.7 38.44 1046 Flash Recovery Area

Issue the following RMAN commands RMAN> crosscheck archivelog all; RMAN> delete expired archivelog all; After “delete expired” command CONTROLFILE 0 0 0 ONLINELOG 0 0 0 ARCHIVELOG 3.83 0 1274 BACKUPPIECE 0 0 0 IMAGECOPY 0 0 0 FLASHBACKLOG 51.23 50.93 1007 Cleanup of Archivelog files from Flash Recovery Area

Flashback Logs SQL> show parameter db_flashback NAME TYPE VALUE ------------------------------------ ----------------------------------- ------- ------------------ db_flashback_retention_target integer 1440 SQL> show parameter log_archive_min_succeed_dest; NAME TYPE VALUE ------------------------------------ ----------- ------------------------ ------ log_archive_min_succeed_dest integer 1

How to cleanup Flashback Logs from Flash Recovery Area SQL> alter system set log_archive_dest_1='LOCATION=/home/oracle/temp_ archivelog' scope=memory; SQL> alter system set log_archive_dest_state_10=defer scope=memory; SQL> alter system set db_recovery_file_dest_size=55G scope=memory; Check the alert log for “deleted Oracle managed files” messages.

How to cleanup Flashback Logs from Flash Recovery Area SQL> select * from v$flash_recovery_area_usage; CONTROLFILE 0 0 0 ONLINELOG 0 0 0 ARCHIVELOG 92.19 0 1274 BACKUPPIECE 0 0 0 IMAGECOPY 0 0 0 FLASHBACKLOG 7.3 0 6 SQL> alter system set db_recovery_file_dest_size=190G scope=memory;

How to cleanup Flashback Logs from Flash Recovery Area SQL> alter system set log_archive_dest_1='' scope=memory; SQL> alter system set log_archive_dest_state_10=enable scope=memory

Conclusion and Best Practices All configuration changes were performed without any significant downtime outages to the production or standby databases. Various Oracle technologies, including RMAN, RAC, ASM, and Data Guard, were successfully utilized together to achieve the final high-availability solution. An awareness and understanding of the technologies available from Oracle, together with an innovative approach to implementation, were critical in building the environment while complying with the customer’s business needs.

Question Answers Contact Information Rama Balaji balajir@TUSC.com (303) 985-2213 www.tusc.com

World-wide Team of IT Professionals Contact Information Rama Balaji balajir@TUSC.com (303) 985-2213 www.tusc.com Core Services – Oracle E-Business Suite – Enterprise Performance Management (EPM) – Oracle DBA, Database, and Infrastructure – Business Intelligence and Data Warehousing – Managed Services – remote support & hosting – Oracle Fusion Middleware – Oracle Hyperion – Software – Training

Copyright © 2009 Rolta International, Inc., All Rights Reserved Oracle High Availability - A Case Study Rama Balaji Senior Oracle Consultant.

Similar presentations

Presentation on theme: "Copyright © 2009 Rolta International, Inc., All Rights Reserved Oracle High Availability - A Case Study Rama Balaji Senior Oracle Consultant."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Copyright © 2009 Rolta International, Inc., All Rights Reserved Oracle High Availability - A Case Study Rama Balaji Senior Oracle Consultant.

Similar presentations

Presentation on theme: "Copyright © 2009 Rolta International, Inc., All Rights Reserved Oracle High Availability - A Case Study Rama Balaji Senior Oracle Consultant."— Presentation transcript:

Similar presentations

About project

Feedback