Thomas E. Canty ServerCare, Inc. Session #126 Data Guard Best Practices & Tuning.

Slides:



Advertisements
Similar presentations
2 Copyright © 2005, Oracle. All rights reserved. Installing the Oracle Database Software.
Advertisements

INTRODUCTION TO ORACLE Lynnwood Brown System Managers LLC Backup and Recovery Copyright System Managers LLC 2008 all rights reserved.
Chapter 14 Handling Online Redo Log Failures. Background RMAN doesn’t back up online redo logs You don’t use RMAN to recover from online redo log failures.
Acknowledgments Byron Bush, Scott S. Hilpert and Lee, JeongKyu
Oracle Architecture. Instances and Databases (1/2)
Page Footer Keed Education Oracle Database Administration Basic Copyright 2009 Keed Education BV Version Concept.
Log Tuning. AOBD 2007/08 H. Galhardas Atomicity and Durability Every transaction either commits or aborts. It cannot change its mind Even in the face.
High Availability Group 08: Võ Đức Vĩnh Nguyễn Quang Vũ
15 Copyright © 2004, Oracle. All rights reserved. Monitoring and Managing Memory.
Backup and recovery Basics of Backup and restoration Types of recovery Defining strategy Starting up and shutting down 80/20 rule SLA’s.
Oracle Data Guard Ensuring Disaster Recovery for Enterprise Data
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 1 Oracle Active Data Guard Performance Joseph Meeks Director, Product Management Oracle.
1 - Oracle Server Architecture Overview
Backup The flip side of recovery. Types of Failures Transaction failure –Transaction must be aborted System failure –Hardware or software problem resulting.
Harvard University Oracle Database Administration Session 2 System Level.
Backup and Recovery Part 1.
ITEC474 Redo Log Files.
Oracle Architecture. Database instance When a database is started the current state of the database is given by the data files, a set of background (BG)
Oracle9i Database Administrator: Implementation and Administration
7 Maintaining Redo Log Files. 7-2 Objectives Explaining the use of online redo log files Obtaining log and archive information Controlling log switches.
Redo Waits Kyle Hailey #.2 Copyright 2006 Kyle Hailey Redo REDO Lib Cache Buffer Cache Locks Network I/O.
Introduction to Oracle Backup and Recovery
Proven Techniques for Maximizing Availability Maximum Availability Architecture Lawrence To, Shari Yamaguchi High Availability Systems Group Systems Technologies.
1 Data Guard Basics Julian Dyke Independent Consultant Web Version - February 2008 juliandyke.com © 2008 Julian Dyke.
Oracle Recovery Manager (RMAN) 10g : Reloaded
PPOUG, 05-OCT-01 Agenda RMAN Architecture Why Use RMAN? Implementation Decisions RMAN Oracle9i New Features.
Presentation #32050 Presentation #32050 Implementing Oracle9i Data Guard For Higher Availability By Daniel T. Liu First American Real Estate Solutions.
Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.
Chapter 7 Making Backups with RMAN. Objectives Explain backup sets and image copies RMAN Backup modes’ Types of files backed up Backup destinations Specifying.
ORACLE 10g DATAGUARD Ritesh Chhajer Sr. Oracle DBA.
By Lecturer / Aisha Dawood 1.  You can control the number of dispatcher processes in the instance. Unlike the number of shared servers, the number of.
CSE 781 – DATABASE MANAGEMENT SYSTEMS Introduction To Oracle 10g Rajika Tandon.
11g(R1/R2) Data guard Enhancements Suresh Gandhi
Rajib Kundu Agenda Definitions Failover Cluster Database Snapshots Log shipping Database Mirroring.
Oracle Advanced Compression – Reduce Storage, Reduce Costs, Increase Performance Session: S Gregg Christman -- Senior Product Manager Vineet Marwah.
7202ICT – Database Administration
Oracle Tuning Ashok Kapur Hawkeye Technology, Inc.
Copyright © Oracle Corporation, All rights reserved. 1 Oracle Architectural Components.
An Oracle server:  Is a database management system that provides an open, comprehensive, integrated approach to information management.  Consists.
1 Data Guard. 2 Data Guard Reasons for Deployment  Site Failures  Power failure  Air conditioning failure  Flooding  Fire  Storm damage  Hurricane.
© Dennis Shasha, Philippe Bonnet 2001 Log Tuning.
1 Chapter 17 Shared Memory Contention. 2 Overview Specifically talking about SGA – Buffer Cache – Redo Log Buffer Contention in these areas of SGA – Can.
Process Architecture Process Architecture - A portion of a program that can run independently of and concurrently with other portions of the program. Some.
14 Copyright © 2005, Oracle. All rights reserved. Backup and Recovery Concepts.
11 Copyright © 2006, Oracle. All rights reserved. Checkpoint and Redo Tuning.
Ashish Prabhu Douglas Utzig High Availability Systems Group Server Technologies Oracle Corporation.
Chapter 1Oracle9i DBA II: Backup/Recovery and Network Administration 1 Chapter 1 Backup and Recovery Overview MSCD642 Backup and Recovery.
Implementing Oracle9i Data Guard Michael New Senior Technical Consultant ThinkSpark Session id:
Overview of Oracle Backup and Recovery Darl Kuhn, Regis University.
18 Copyright © 2004, Oracle. All rights reserved. Backup and Recovery Concepts.
CERN IT Department CH-1211 Genève 23 Switzerland 1 Active Data Guard Svetozár Kapusta Distributed Database Operations Workshop November.
18 Copyright © 2004, Oracle. All rights reserved. Recovery Concepts.
7 Copyright © Oracle Corporation, All rights reserved. Instance and Media Recovery Structures.
14 Copyright © 2005, Oracle. All rights reserved. Backup and Recovery Concepts.
Agenda Data Guard Architecture & Features
13 Copyright © 2007, Oracle. All rights reserved. Using the Data Recovery Advisor.
© Puget Sound Oracle Users Group Education Is Our Passion PSOUG Education Education Is Our Passion Hands-on Workshop Series Oracle DataGuard 10gR2.
14 Copyright © 2007, Oracle. All rights reserved. Backup and Recovery Concepts.
Oracle Database Architectural Components
1 Implementing Oracle Data Guard for the RLS database Kasia Pokorska CERN, IT-DB 30 th March 2004.
Oracle Database High Availability
Oracle 12c Data Guard – Far Sync and what’s new
Maintaining Online Redo Log Files
A Technical Overview of Microsoft® SQL Server™ 2005 High Availability Beta 2 Matthew Stephen IT Pro Evangelist (SQL Server)
Oracle Database High Availability
Oracle9i Database Administrator: Implementation and Administration
Performing Database Recovery
Chapter 5 The Redo Log Files.
Introduction.
Presentation transcript:

Thomas E. Canty ServerCare, Inc. Session #126 Data Guard Best Practices & Tuning

Speaker Qualifications Thomas E. Canty, Senior Oracle DBA, ServerCare, Inc. 19 years of Oracle experience, starting with version 5 Has presented at IOUG, OpenWorld, NoCOUG, IASA, Has been a DBA, Developer, Architect, and IT Manager Has worked with Fortune 100 companies in Healthcare, Technology, Pharmaceuticals, and Telecom, as well as Major Universities

Outline Overview Network Optimization ARCn & LGWR Redo Transport Checkpoint, Redo Read/Apply & Recovery Wait Events 10g R2 & 11g Improvements Best Practices

Data Guard Modes Maximum Performance Mode –Least performance impact –Default mode Maximum Protection Mode –Emphasis on data safety –Requires at least one secondary Maximum Availability Mode –Emphasis on uptime –Continues if secondary unavailable

Physical vs. Logical Standby FeaturePhysical StandbyLogical Standby Disaster recovery & HA Yes Data protectionYes PerformanceMost Efficient - Redo Apply bypass SQL level layers Redo converted to SQL before it is applied Primary DB work reduction Limited read only reportingUnrestricted read only reporting Efficient use of standby Limited read only reportingExtra schemas are unrestricted read/write Data type restrictions No restrictionsDoes not include LONG, LOB, etc. Rolling upgradesNot availableYes

Outline Overview Network Optimization ARCn & LGWR Redo Transport Checkpoint, Redo Read/Apply & Recovery Wait Events 10g R2 & 11g Improvements Best Practices

Session Data Unit (SDU) In Oracle Net connect descriptor: sales.servercare.com= (DESCRIPTION= (SDU=32767) (ADDRESS=(PROTOCOL=tcp) (HOST=sales-server)(PORT=1521)) (CONNECT_DATA= (SID=sales.servercare.com))) Globally in sqlnet.ora: –DEFAULT_SDU_SIZE=32767

Session Data Unit (SDU) (Cont.) On standby DB, set in listener.ora: SID_LIST_listener_name= (SID_LIST= (SID_DESC= (SDU=32767) (GLOBAL_DBNAME=sales.servercare.com) (SID_NAME=sales) (ORACLE_HOME=/u01/app/oracle/product/10.2.0/db_1)))

TCP Socket Buffer Size Set TCP socket buffer size = 3 * BDP –Data Guard broker config. – Set in sqlnet.ora –Non Data Guard broker – set in connect descriptor BDP - Bandwidth Delay Product RTT- Round Trip Time

TCP Socket Buffer Size Assume gigabit network with RTT 25 ms BDP= 1,000 Mbps * 25msec (.025 sec) 1,000,000,000 * ,000,000 Megabits / 8 = 3,125,000 bytes In this example: socket buffer size = 3 * bandwidth * delay = 3,125,000 * 3 = 9,375,000 bytes sqlnet.ora: RECV_BUF_SIZE= SEND_BUF_SIZE=

Network Queue Sizes Between kernel net. subsystems & NIC driver txqueuelen - transmit queue size netdev_max_backlog - receive queue size Assumes gigabit network with 100ms latency Set queues: –ifconfig eth0 txqueuelen –sysctl.conf: net.core.netdev_max_backlog=20000

Overall Network Ensure sufficient bandwidth to standby Verify TCP_NODELAY set to YES (default) RHEL3 - increase /proc/sys/fs/aio-max-size on standby –From (default) to Set RECV_BUF_SIZE & SEND_BUF_SIZE = 3 * Bandwidth Delay Product (BDP) Use Session Data Unit (SDU) size of Increase send & receive queue sizes –TXQUEUELENGTH –NET_DEV_MAX_BACKLOG

Outline Overview Network Optimization ARCn & LGWR Redo Transport Checkpoint, Redo Read/Apply & Recovery Wait Events 10g R2 & 11g Improvements Best Practices

ARCn Redo Transport 1) Read from local arch 2) Receive redo 3) Ack - know - ledge

ASYNC LGWR Redo Transport 1) Write local redo 2) ASYNC send redo 3) Receive redo 4) Ack - know - ledge 5) Write stdby redo

SYNC LGWR Redo Transport 1) Write local redo 2) SYNC send redo 3) Receive redo 4) Ack - know - ledge 5) Post receipt to LGWR

Optimize ARCn Transport Increase MAX_CONNECTIONS to 5 on standby (if possible) –default (2), maximum (5) Increase LOG_ARCHIVE_MAX_PROCESSES –Larger than MAX_CONNECTIONS –Up to network bandwidth –default (2), maximum (30)

Optimize LGWR Transport Decrease NET_TIMEOUT (default 180 secs.) –Be careful! - Not too low New COMMITS –COMMIT IMMEDIATE WAIT (default) –COMMIT NOWAIT –COMMIT NOWAIT BATCH

All Redo Transport Standby redo logs –Use fastest disks –No RAID5 –Don’t multiplex –Use the recommended number of SRLs (maximum# of online logfiles + 1) * maximum# of threads

Outline Overview Network Optimization ARCn & LGWR Redo Transport Checkpoint, Redo Read/Apply & Recovery Wait Events 10g R2 & 11g Improvements Best Practices

Checkpoint Phase Checkpoint occurs –During log switch –LOG_CHECK_TIMEOUT expiration –LOG_CHECKOUT_INTERVAL reached Reduce log switch interval –Resize redo log to 1GB - primary and secondary –Recommended - checkpoint every 15 minutes

Checkpoint Phase (Cont.) Determine checkpoint frequency COL NAME FOR A35; SELECT NAME, VALUE, TO_CHAR(SYSDATE, ‘HH:MI:SS’) TIME FROM V$SYSSTAT WHERE NAME = 'DBWR checkpoints'; NAME VALUE TIME DBWR checkpoints :15:43 SQL> / NAME VALUE TIME DBWR checkpoints :34:06

Redo Read (Secondary) Obtain read rate for the standby redo log SQL> ALTER SYSTEM DUMP LOGFILE '/u01/oradata/docprd/sredo01.log’ validate; System altered. $vi docprd_ora_3560.trc Mon Mar 12 08:59: ……………… Redo read statistics for thread Read rate (ASYNC): 4527Kb in 0.58s => 6.90 Mb/sec Longest record: 19Kb, moves: 0/7586 (0%) Change moves: 4340/18026 (24%), moved: 2Mb Longest LWN: 92Kb, moves: 1/1365 (0%), moved: 0Mb Last redo scn: 0x ( )

Redo Apply (Secondary) Goal –Redo apply rate (secondary) > Redo create rate (primary) Carefully consider enabling DB_BLOCK_CHECKING –LOW, MEDIUM and FULL options –Possible performance impact

Redo Apply (Cont.) Determine Log Block Size (LEBSZ) SELECT LEBSZ FROM X$KCCLE WHERE ROWNUM=1; Get recovery blocks - at least two snapshots –Managed Recovery Case SELECT PROCESS, SEQUENCE#, THREAD#, block#, BLOCKS, TO_CHAR(SYSDATE, 'DD-MON-YYYY HH:MI:SS') time from v$MANAGED_STANDBY WHERE PROCESS='MRP0'; Determine the recovery rate (MB/sec) for a specific archive sequence number –Managed Recovery Case: ((BLOCK#_END - BLOCK#_BEG) * LOG_BLOCK_SIZE) / (TIME_END - TIME_BEG) * 1024 * 1024

Redo Apply (Cont.) Redo Generation Rate vs. Redo Apply RateRecommendation 2 * Max Primary DB Redo Generation Rate < Redo Apply Rate Excellent - No Tuning Required Max Primary DB Redo Generation Rate < Redo Apply Rate < 2 * Max Primary DB Redo Generation Rate Good - Tuning is Optional Avg. Primary Redo Generation Rate < Redo Apply RateOK - Need Tuning Avg. Primary Redo Generation Rate > Redo Apply RateBad - Need Tuning Oracle Recommends:

Recovery Parallel Recovery (before ) –Set to number of CPUs recover managed standby database parallel ; PARALLEL_EXECUTION_MESSAGE_SIZE –Can increase to 4096 or 8192 Uses additional shared pool memory –Problems if set too high DB_CACHE_SIZE –Can set secondary DB_CACHE_SIZE >= primary Must set to primary before changing roles

Outline Overview Network Optimization ARCn & LGWR Redo Transport Checkpoint, Redo Read/Apply & Recovery Wait Events 10g R2 & 11g Improvements Best Practices

Arch Wait Events - Primary ARCH wait on ATTACH –Time for all arch processes to spawn RFS connection ARCH wait on SENDREQ –Time for all arch processes to write received redo to disk + open & close remote archived redo logs ARCH wait on DETACH –Time for all arch processes to delete RFS connection

LGWR SYNC Wait Events - Primary LGWR wait on ATTACH –Time for all log writer processes to spawn RFS connection LGWR wait on SENDREQ –Time for all log writer processes to write received redo to disk + open & close the remote archived redo logs LGWR wait on DETACH –Time for all log writer processes to delete RFS conn.

LGWR ASYNC Wait Events - Primary LNS wait on ATTACH –Time for all network servers to spawn RFS connection LNS wait on SENDREQ –Time for all network servers to write received redo to disk + open & close the remote archived redo logs LNS wait on DETACH –Time for all network servers to delete RFS conn. LGWR wait on full LNS buffer –Time for log writer (LGWR) process awaiting for network server (LNS) to free ASYNC buffer space

Wait Events on Secondary RFS Write –Time to write to standby redo log or archive log + non I/O work like redo block checksum validation RFS Random I/O –Time to write to a standby redo log to occur RFS Sequential I/O –Time to write to an archive log to occur

Outline Overview Network Optimization ARCn & LGWR Redo Transport Checkpoint, Redo Read/Apply & Recovery Wait Events 10g R2 & 11g Improvements Best Practices

10g R2 Improvements Multiple archive processes can transmit a redo log in parallel to the standby database –MAX_CONNECTIONS attribute of the LOG_ARCHIVE_DEST_n controls the number of these processes Parallel Recovery for Redo apply is automatically set equal to number of CPUs – and Fast-Start Failover –Automatically fails over to a previously chosen physical standby database

10g R2 Improvements (Cont.) LGWR ASYNC –Uses a new process (LNSn) to transmit the redo data directly from the online redo log to the standby database Physical standby database flashback –Can flash back temporarily for reporting Logical standby database –Automatically deletes applied archived log RMAN –Automatically creates temp datafiles after recovery

11g Improvements Physical standby database open read/write for test or other purposes with zero compromise in data protection using new Snapshot Standby Automatic failover configurable for immediate response to designated events or errors More flexibility in primary/standby configurations –e.g. Windows primary and Linux standby Rolling upgrade options now in physical standby with Transient Logical Standby ASYNC transport enhanced to eliminate the impact of latency on network throughput

11g Improvements (Cont.) Fast detection of corruptions caused by lost writes in the storage layer SQL Apply supports XML data type (CLOB) Many performance, manageability, and security enhancements Support for new Oracle Database 11g Options – Oracle Active Data Guard and Oracle Advanced Compression Fast Start Failover now available for Maximum Performance mode

Outline Overview Network Optimization ARCn & LGWR Redo Transport Checkpoint, Redo Read/Apply & Recovery Wait Events 10g R2 & 11g Improvements Best Practices

Best Practices Geographically separate primary & standby DB Ensure standby hardware configuration same as the primary –Tune standby for write intensive operations Test Data Guard before deploy in production Set standard OS and DB parameters to recommended values Perform switchover testing –Fully document a failover procedure Use FORCE LOGGING mode

Best Practices (Cont.) Use real-time apply Use the Data Guard Broker Enable Flashback Database on both primary and secondary databases Evaluate using AFFIRM attribute –Possible performance issues on primary Verify Asynchronous I/O enabled Carefully consider DB_BLOCK_CHECKING

Best Practices (Cont.) Don’t multiplex standby redo logs (SRLs) Correctly set number of SRLs Increase PARALLEL_EXECUTION_MESSAGE_SIZE Place SRLs in fast disk group or disks Use at lease two standby DBs with Maximum Protection Mode Utilize COMMIT NOWAIT if appropriate

Best Practices (Cont.) Ensure appropriate bandwidth between primary and secondary Increase default send & receive queue sizes –TXQUEUELENGTH –NET_DEV_MAX_BACKLOG Session Data Unit –Adjust value to Improvement during large data transmissions

Questions? Lots of things we didn’t cover If we don’t cover something you wanted to hear, please contact me.

THANK YOU! Please fill out evaluations! Tom Canty: Or Call: Session #126: Data Guard Best Practices & Tuning