Marcin Blaszczyk, Zbigniew Baranowski – CERN Outline Overview & Architecture Use Cases for Our experience with ADG and lessons learned Conclusions.

Slides:



Advertisements
Similar presentations
Active Data Guard at CERN
Advertisements

ORACLE DATABASE HIGH AVAILABILITY & ORACLE 11GR2 DATA GUARD 1 Güneş EROL.
ITEC474 INTRODUCTION.
INTRODUCTION TO ORACLE Lynnwood Brown System Managers LLC Backup and Recovery Copyright System Managers LLC 2008 all rights reserved.
INTRODUCTION TO ORACLE Lynnwood Brown System Managers LLC Oracle High Availability Solutions RAC and Standby Database Copyright System Managers LLC 2008.
Introduction to DBA.
High Availability Group 08: Võ Đức Vĩnh Nguyễn Quang Vũ
AUSOUG National Conference Series g Active Data Guard More than just DR…. Gavin Soorma Senior Oracle DBA, Bankwest.
Oracle Data Guard Ensuring Disaster Recovery for Enterprise Data
© 2015 Dbvisit Software Limited | dbvisit.com An Introduction to Dbvisit Standby.
Data Guard SQL Apply Back to the Future! Larry M. Carpenter Senior Principal Consultant Server Technologies Oracle Corporation Session id:
Keith Burns Microsoft UK Mission Critical Database.
10 Copyright © 2009, Oracle. All rights reserved. Managing Undo Data.
EIM April 19, Robin Weaver 13 Years with IBM Prior to Assignment at UNC Charlotte Range of Database Development/Data Management Projects and Products.
Oracle Database 11g: First Experiences with Grid Computing
Database Backup and Recovery
Backup and Recovery Part 1.
Module 14: Scalability and High Availability. Overview Key high availability features available in Oracle and SQL Server Key scalability features available.
Backup Concepts. Introduction Backup and recovery procedures protect your database against data loss and reconstruct the data, should loss occur. The.
1© Copyright 2012 EMC Corporation. All rights reserved. November 2013 Oracle Continuous Availability – Technical Overview.
National Manager Database Services
Oracle backup and recovery strategy
Introduction to Oracle Backup and Recovery
Oracle Database High Availability Brandon Kuschel Jian Liu Source: Oracle Database 11g Release 2 High Availability An Oracle White Paper November 2010.
Database Upgrade/Migration Options & Tips Sreekanth Chintala Database Technology Strategist.
Backup & Recovery Concepts for Oracle Database
Proven Techniques for Maximizing Availability Maximum Availability Architecture Lawrence To, Shari Yamaguchi High Availability Systems Group Systems Technologies.
CERN IT Department CH-1211 Genève 23 Switzerland t Data Protection with Oracle Data Guard Jacek Wojcieszuk, CERN/IT-DM Distributed Database.
High Availability & Oracle RAC 18 Aug 2005 John Sheaffer Platform Solution Specialist
1 Data Guard Basics Julian Dyke Independent Consultant Web Version - February 2008 juliandyke.com © 2008 Julian Dyke.
13 Copyright © Oracle Corporation, All rights reserved. RMAN Complete Recovery.
PPOUG, 05-OCT-01 Agenda RMAN Architecture Why Use RMAN? Implementation Decisions RMAN Oracle9i New Features.
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
Oracle on Windows Server Introduction to Oracle10g on Microsoft Windows Server.
Lost Writes, a DBA’s Nightmare? UKOUG 2013 Technology Conference Luca Canali – CERN Marcin Blaszczyk - CERN.
5 Copyright © 2004, Oracle. All rights reserved. Using Recovery Manager.
Chapter 7 Making Backups with RMAN. Objectives Explain backup sets and image copies RMAN Backup modes’ Types of files backed up Backup destinations Specifying.
16 Copyright © 2007, Oracle. All rights reserved. Performing Database Recovery.
11g(R1/R2) Data guard Enhancements Suresh Gandhi
9 Copyright © 2004, Oracle. All rights reserved. Flashback Database.
Oracle Advanced Compression – Reduce Storage, Reduce Costs, Increase Performance Session: S Gregg Christman -- Senior Product Manager Vineet Marwah.
1 Data Guard. 2 Data Guard Reasons for Deployment  Site Failures  Power failure  Air conditioning failure  Flooding  Fire  Storm damage  Hurricane.
7 Copyright © 2005, Oracle. All rights reserved. Managing Undo Data.
CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle Real Application Clusters (RAC) Techniques for implementing & running robust.
14 Copyright © 2005, Oracle. All rights reserved. Backup and Recovery Concepts.
3 Copyright © 2006, Oracle. All rights reserved. Using Recovery Manager.
Ashish Prabhu Douglas Utzig High Availability Systems Group Server Technologies Oracle Corporation.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
Overview of Oracle Backup and Recovery Darl Kuhn, Regis University.
18 Copyright © 2004, Oracle. All rights reserved. Backup and Recovery Concepts.
CERN IT Department CH-1211 Geneva 23 Switzerland t WLCG Operation Coordination Luca Canali (for IT-DB) Oracle Upgrades.
Emil Pilecki Credit: Luca Canali, Marcin Blaszczyk, Steffen Pade.
CERN IT Department CH-1211 Genève 23 Switzerland 1 Active Data Guard Svetozár Kapusta Distributed Database Operations Workshop November.
7 Copyright © Oracle Corporation, All rights reserved. Instance and Media Recovery Structures.
14 Copyright © 2005, Oracle. All rights reserved. Backup and Recovery Concepts.
Replicazione e QoS nella gestione di database grid-oriented Barbara Martelli INFN - CNAF.
13 Copyright © 2007, Oracle. All rights reserved. Using the Data Recovery Advisor.
© Puget Sound Oracle Users Group Education Is Our Passion PSOUG Education Education Is Our Passion Hands-on Workshop Series Oracle DataGuard 10gR2.
14 Copyright © 2007, Oracle. All rights reserved. Backup and Recovery Concepts.
10 Copyright © 2007, Oracle. All rights reserved. Managing Undo Data.
Oracle Database High Availability
Agenda Data Guard Architecture & Features
Database Services at CERN Status Update
Maximum Availability Architecture Enterprise Technology Centre.
Oracle Database High Availability
Oracle Database Monitoring and beyond
SQL Server High Availability Amit Vaid.
Performing Database Recovery
Introduction.
Oracle Data Guard Session-4
Presentation transcript:

Marcin Blaszczyk, Zbigniew Baranowski – CERN

Outline Overview & Architecture Use Cases for Our experience with ADG and lessons learned Conclusions

Outline Overview & Architecture Use Cases for Our experience with ADG and lessons learned Conclusions

5 Active Data Guard – the Basics ADG enables read only access to the physical standby

6 CERN’s Databases & DG service ~100 Oracle databases, most of them RAC Mostly NAS storage plus some SAN with ASM ~300 TB of data files for production DBs in total 16 Data Guard RAC clusters in Prod (and growing) Active Data Guards since upgrade to 11g Production RAC systems Number of nodes: NAS storage with SSD cache (Active) Data Guard RAC Number of nodes: Sized for average production load ASM on commodity HW, recycle ‘old production’ Redo Transport

7 Architecture We Use Primary Database Active Data Guard for disaster recovery Active Data Guard for users’ access 2. Busy & critical ADG 1. Low load ADG &less critical Active Data Guard for users’ access and for disaster recovery Primary Database Maximum performance Redo Transport LOG_ARCHIVE_DEST_X=‘SERVICE= OPTIONAL ASYNC NOAFFIRM VALID FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME= ’

Outline Overview & Architecture Use Cases for Our experience with ADG and lessons learned Conclusions

9 Active Data - Overview Replication Inside CERN from online (data acquisition) to offline (analysis) Replication to remote sites (being considered) Load Balancing Offloading queries Offloading backups Features available also with Data Guard Disaster recovery Duplication of databases Others

10 Replication Use Cases Number of replication use cases Inside CERN Across the Grid So far handled by Oracle Streams Logical replication with messages called LCRs (Logical Change Record) Service started with 10gR1, now 11gR2 Redo Transport PROPAGATION

11 Offloading Queries to ADG Workload distribution Transactional workload runs on primary Read-only workload can be moved to ADG Read-mostly workload DMLs can be redirected to remote database with a dblink Examples of workload on our ADGs: Ad-hoc queries, analytics and long-running reports, parallel queries, unpredictable workload and test queries

12 Offloading Backups to ADG Significantly reduces load on primary Removes sequential I/O of full backup ADG has great improvement for VLDBs allows usage of block change tracking for fast incremental backups

13 Disaster Recovery We have been using it since a few years Switchover/failover is our first line of defense Saved the day already for production services Disaster recovery site at ~10 km from our datacenter In the close future remote site in Hungary * Active Data Guard option not required

14 Duplicate for Upgrade Clusterware 10g + RDBMS 10g Clusterware 11g + RDBMS 10g Redo Transport RW Access RW Acess Clusterware 11g + RDBMS 11g RDBMS upgrade DATABASE downtime Upgrade complete!

15 Duplicate for Upgrade - Comments Risk is limited Fresh installation of the new clusterware Old system stays untouched Allows full upgrade test Allows stress testing of new system Downtime is limited ~ 1h for RDBMS upgrade Additional hardware is required  Only for the time of the upgrade * Active Data Guard option not required

16 More Use Cases Load testing Real application testing (RAT) Recover objects against logical corruption Human errors Leverage flashback logs Additional writes may have the negative impact on production database Data lifecycle activities

17 Snapshot Standby Facilitates opening standby in read-write mode Number of steps is limited to two Restore point created internally by Oracle Primary database is always protected Redo is being received when standby is opened SQL> ALTER DATABASE CONVERT TO SNAPSHOT STANDBY; SQL> ALTER DATABASE CONVERT TO PHYSICAL STANDBY;

18 SQL> ALTER TABLESPACE data_2011q1 READ ONLY; Data Lifecycle Activities Redo Transport Transportable Tablespaces Transport big data volumes from busy production DBs Transportable Tablespaces (TTS), Data Pump Challenge: TTS requires read only tablespaces Data Pump SQL> ALTER TABLESPACE data_2011q2 READ ONLY; SQL> ALTER TABLESPACE data_2011q3 READ ONLY; SQL> ALTER TABLESPACE data_2011q4 READ ONLY;

19 Custom Monitoring Monitoring Availability Latency + Automatic MRP restart Racmon

20 Custom Monitoring Strmmon

Outline Overview & Architecture Use Cases for Our experience with ADG and lessons learned Conclusions

22 Sharing Production Experience ADG in production since Q Oracle , OS: RHEL5 64 bit More robust than Streams Fewer incidents, less configuration, less manpower Users like the low latency Some more complex replication setups Still on Streams (or GG in future)

23 ADG vs. Streams for Our Use Cases Active Data Guard Block level replication More robust & Lower latency Full database replica Less maintenance effort More use cases than just replication Complex replication cases not covered  Read-only replica  Streams SQL based replication More flexible Data selectivity Replica is accessible for read-write load Unsupported data types  Throughput limitations  Can be complicated to recover 

24 Redo Apply Throughput ADG redo apply by MRP Noticeable improvement in speed in 11g vs. 10g We see up to ~100 MB/s of redo processed Much higher than typical redo generation in our DBs Note on our configuration: Standby logfiles, real time apply + force logging Archiving to ADG using ASYNC NOAFFIRM

25 Redo Apply Latency Latency to ADG normally no more than 1 sec Apply latency spikes: ~20 sec, depends on workload Redo apply slaves wait for checkpoint completed (11g) Latency builds up when ADG is creating big datafiles One redo apply slave creates file, rest of slaves wait (11g) MRP (recovery) can stop with errors or crashes Possible workarounds: Minimize redo switch by using large redo log files Create datafiles with small initial size and then (auto)extend

26 Latency and Applications Currently alarms to on-call DBAs When latency primary-ADG greater than programmable When MRP crashed (automatic restart) What if we had to guarantee latency Applications maybe would need to be modified to make them latency-aware Useful techniques to know ALTER SESSION SET STANDBY_MAX_DATA_DELAY= ; ALTER SESSION SYNC WITH PRIMARY; – SYNC only!

27 Offloading Backups to ADG Problem: OLTP production slow during backup Latency for random IO is critical to this system degraded by backup sequential IO It’s a storage issue: Midrange NAS storage

28 Details of Storage Issue on Primary Very slow reads appear Reads from SSD cache go to zero Number of waits

29 Custom Backup Implementation We have modified our backup implementation and adapted it for ADG It was worth the effort for us. Many details to deal with. Google “Szymon Skorupinski CERN openworld 2012” for details

30 Backups with RMAN We find backups with RMAN still a very good idea Data Guard is not a backup solution Automatic restore and recovery system Periodic checks that backups can be recovered Block change tracking on ADG Incremental backups only read changed blocks Highly beneficial for backup performance

31 Administration In most cases ADG doesn’t require much DBA time after initial setup Not much different than handling production DBs Although we see bugs and issues specific to ADG

32 Failing Queries on ADG Sporadic ORA-1555 (snapshot too old) on ADG Normal behaviour if queries need read consistent images outside undo retention threshold This case is anomalous: short queries Under investigation, seems a bug Example: ORA caused by SQL statement below (SQL ID: …, Query Duration=1 sec, SCN: …)

33 Features Under Investigation Data Guard Broker fast-start failover Partial standby configuration Cascading standby Synchronous redo transport ADG in 12c

Outline Overview & Architecture Use Cases for Our experience with ADG and lessons learned Conclusions

35 Conclusions Active Data Guard provides added value to our database services Almost 1.5 year of production experience In particular ADG has helped us Strengthening our replication deployments Offloading production workload, including backups We find ADG has low maintenance overhead Although requires some additional effort in monitoring Found a few bugs on the way We find ADG mature Although we also look forward to additional improvements in next releases We are planning to extend our usage of ADG