Atlas LCG 3D Oracle cluster migration strategy at BNL Carlos Fernando Gamboa On behalf of database group Grid Group, RACF Facility, Brookhaven National.

Slides:



Advertisements
Similar presentations
ITEC474 INTRODUCTION.
Advertisements

The Architecture of Oracle
Introduction to Oracle
2 Copyright © 2005, Oracle. All rights reserved. Installing the Oracle Database Software.
INTRODUCTION TO ORACLE Lynnwood Brown System Managers LLC Backup and Recovery Copyright System Managers LLC 2008 all rights reserved.
INTRODUCTION TO ORACLE Lynnwood Brown System Managers LLC Oracle High Availability Solutions RAC and Standby Database Copyright System Managers LLC 2008.
Acknowledgments Byron Bush, Scott S. Hilpert and Lee, JeongKyu
Overview of Database Administrator (DBA) Tools
Oracle9i Database Administrator: Implementation and Administration 1 Chapter 2 Overview of Database Administrator (DBA) Tools.
Oracle Architecture. Instances and Databases (1/2)
DB server limits (process/sessions) Carlos Fernando Gamboa, BNL Andrew Wong, TRIUMF WLCG Collaboration Workshop, CERN Geneva, April 2008.
Page Footer Keed Education Oracle Database Administration Basic Copyright 2009 Keed Education BV Version Concept.
Oracle 10g Database Administrator: Implementation and Administration
1 - Oracle Server Architecture Overview
Harvard University Oracle Database Administration Session 2 System Level.
A Guide to Oracle9i1 Introduction to Oracle9i Database Administration Chapter 11.
Backup and Recovery Part 1.
Oracle Architecture. Database instance When a database is started the current state of the database is given by the data files, a set of background (BG)
CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle Data Guard for RAC migrations WLCG Service Reliability Workshop CERN, November.
Introduction to Oracle Backup and Recovery
Using RMAN to Perform Recovery
1 RAL Status and Plans Carmine Cioffi Database Administrator and Developer 3D Workshop, CERN, November 2009.
BNL Oracle database services status and future plans Carlos Fernando Gamboa RACF Facility Brookhaven National Laboratory, US Distributed Database Operations.
1 Copyright © 2005, Oracle. All rights reserved. Introduction.
1 Copyright © 2009, Oracle. All rights reserved. Exploring the Oracle Database Architecture.
Oracle Recovery Manager (RMAN) 10g : Reloaded
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.
ASGC 1 ASGC Site Status 3D CERN. ASGC 2 Outlines Current activity Hardware and software specifications Configuration issues and experience.
Oracle on Windows Server Introduction to Oracle10g on Microsoft Windows Server.
Oracle10g RAC Service Architecture Overview of Real Application Cluster Ready Services, Nodeapps, and User Defined Services.
Basic Oracle Architecture
Backup & Recovery Backup and Recovery Strategies on Windows Server 2003.
CSE 781 – DATABASE MANAGEMENT SYSTEMS Introduction To Oracle 10g Rajika Tandon.
CERN - IT Department CH-1211 Genève 23 Switzerland t Tier0 database extensions and multi-core/64 bit studies Maria Girone, CERN IT-PSS LCG.
11g(R1/R2) Data guard Enhancements Suresh Gandhi
Copyright © Oracle Corporation, All rights reserved. 1 Oracle Architectural Components.
An Oracle server:  Is a database management system that provides an open, comprehensive, integrated approach to information management.  Consists.
RAC parameter tuning for remote access Carlos Fernando Gamboa, Brookhaven National Lab, US Frederick Luehring, Indiana University, US Distributed Database.
DB Questions and Answers open session Carlos Fernando Gamboa, BNL WLCG Collaboration Workshop, CERN Geneva, April 2008.
Process Architecture Process Architecture - A portion of a program that can run independently of and concurrently with other portions of the program. Some.
1 Copyright © 2005, Oracle. All rights reserved. Introduction.
CERN Database Services for the LHC Computing Grid Maria Girone, CERN.
Instance and Media Recovery Structures Supinfo Oracle Lab. 7.
Chapter 1Oracle9i DBA II: Backup/Recovery and Network Administration 1 Chapter 1 Backup and Recovery Overview MSCD642 Backup and Recovery.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
Overview of Oracle Backup and Recovery Darl Kuhn, Regis University.
BNL Oracle database services status and future plans Carlos Fernando Gamboa, John DeStefano, Dantong Yu Grid Group, RACF Facility Brookhaven National Lab,
Oracle Architecture - Structure. Oracle Architecture - Structure The Oracle Server architecture 1. Structures are well-defined objects that store the.
Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.
3 Copyright © 2004, Oracle. All rights reserved. Creating an Oracle Database.
7 Copyright © Oracle Corporation, All rights reserved. Instance and Media Recovery Structures.
14 Copyright © 2005, Oracle. All rights reserved. Backup and Recovery Concepts.
Scalable Oracle 10g for the Physics Database Services Luca Canali, CERN IT January, 2006.
Database CNAF Barbara Martelli Rome, April 4 st 2006.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
3 Copyright © 2006, Oracle. All rights reserved. Installation and Administration Basics.
1 Copyright © 2006, Oracle. All rights reserved. Introduction.
I NTRODUCTION OF W EEK 2  Assignment Discussion  Due this week:  1-1 (Exam Proctor): everyone including in TLC  1-2 (SQL Review): review SQL  Review.
9 Copyright © 2004, Oracle. All rights reserved. Incomplete Recovery.
DB Questions and Answers open session (comments during session) WLCG Collaboration Workshop, CERN Geneva, 24 of April 2008.
14 Copyright © 2007, Oracle. All rights reserved. Backup and Recovery Concepts.
5 Copyright © 2005, Oracle. All rights reserved. Managing the Oracle Instance.
Oracle Database Architectural Components
CERN IT Department CH-1211 Genève 23 Switzerland t Using Data Guard for hardware migration UKOUG RAC & HA SIG, Feb 2008 Miguel Anjo, CERN.
Scalable Database Services for Physics: Oracle 10g RAC on Linux
Introduction of Week 3 Assignment Discussion
Scalable Database Services for Physics: Oracle 10g RAC on Linux
Core Concepts and Tools of the Oracle Database
Index Index.
Presentation transcript:

Atlas LCG 3D Oracle cluster migration strategy at BNL Carlos Fernando Gamboa On behalf of database group Grid Group, RACF Facility, Brookhaven National Lab WLCG Collaboration Workshop, CERN Geneva, April 2008.

2 Table of Contents Motivation for upgrade from 32 to 64 bits General description of BNL RAC database architecture Plan executed Results Conclusions

3 Motivation for upgrade from 32 to 64 bits Oracle Take advantage of cluster installed memory resources: Current configuration the SGA can not be set up beyond 2.3 GB. Could be improved by implementing (high effort though): -Hugemem kernel: The hugemem kernel allows for a 3.42 GB SGA with a VLM. This kernel does come with a performance overhead of probably 5-15% due to address space switching. In 64-bits memory addressing is the improved, 64 bits words or 18 billion GB memory compared to 32 bits size or 2^32=4GB of memory. More data can be held in memory, reducing I/O to disks and thereby increasing throughput. Better performance by carrying out 64-bit integer and floating point integer arithmetic operations.

4 Motivation for upgrade from 32 to 64 bits Oracle Increase cluster utilization Upgrade would provide ability to host more services without interfering with deployed database services currently deployed - Stream apply requires 900 MB on memory - 64 bit enable the extension of SGA memory allocation beyond 2.3GB, which would enable us, to host databases that requires a high buffer cache memory allocation, such as the TAGS database. This database requires at least 1GB of memory allocated buffer cache to achieve optimal configuration and performance.

5 LCG- 3D BNL Cluster Hardware Specification IBM DS SAS disk 300GB/disk Storage expansion IBM DS SAS disk, 300GB/disk Storage 2 Servers IBM Memory 8GB - Dual Core CPU XENON Intel 3GHz - HBA dual ported Hardware RAID controllers, Fiber Channel

6 Node1 Instance 1 ORCL1 Instance 1 ASM Oracle Cluster Ready Services OS Node2 Instance 2 ORCL2 Instance 2 ASM Oracle Cluster Ready Services OS Virtual IP addresses Public Network Private High Speed Network Disk Storage Oracle database, ASM volume manager and cluster file system Oracle RAC architecture Service3D Conditions DatabaseTAGS test areaBackup/Recovery Area Disk Group / size+DG_DATA1 / 1.4TB +DG_DATA2 / 700GB +Flash Recovery Area / 700GB Current data size168GB168TB188GB OCR,Voting disk

7 Oracle RAC Configuration -Homogeneous node configuration -Oracle homes are installed in every node on local ext 3 file system -ASMlib is used to label partitions that are formatted the LUNs presented into the system (persistency across reboots and storage reorganizations) 32 bits install System RHEL4 U4 WS. Kernel ELsmp #1 SMP oracleasmlibs I bits install System RHEL4 U4 ES. Kernel ELsmp #1 SMP oracleasmlib x86_64

8 Migration process -preparing backups- Instance 1 ASM Instance1 OS Oracle Cluster Ready Services Node 1 / 32 bits DATABASE 32 BITS Virtual IP addresses Public Network Private High Speed Network Instance 2 ASM Instance2 OS Oracle Cluster Ready Services Node 2 / 32 bits DATA DISK GROUP 32 BITS FLASH RECOVERY AREA BACKUP DEST 1 32 BITS FLASH RECOVERY AREA BACKUP DEST 2 32 BITS -VERIFY DATABASE RECOVERAVILITY Script can be found in the 3D twiki docs LCG 3D DBA meeting at CNAF 2007 “Hands on exercises” -ENABLE SECOND BACKUP AREA Don’t forget to backup to a secure place: -DB and ASM admin OS directories -DBS directory -database pfile -TNSNAMES.ora -Listener.ora (two nodes) -sqlnet.ora Database information needed in case of database recovery: -Spfile location -DB recovery file destination -Dbid

9 Migration strategy used - Conservative approach- Since OCR and voting disk are shared on both nodes the entire service need to be stop –Oracle Cluster Registry (OCR): records cluster configuration information –Voting Disk: records node membership information –Needs to be reinstalled not compatible 32 with 64 bits Stop database services Backup Database

10 Migration plan executed -OS 64 bits install- DATABASE 32 BITS -STOP DATABASE SERVICES -BACKUP DATABASE -REINSTALLATION OF OS 64 BITS Instance 1 ASM Instance1 OS Oracle Cluster Ready Services Node 1 / 32 bits Virtual IP addresses Public Network Private High Speed Network Instance 2 ASM Instance2 OS Oracle Cluster Ready Services Node 2 / 32 bits DATA DISK GROUP 32 BITS DATA SIZE: 162 GB FLASH RECOVERY AREA BACKUP DEST 1 32 BITS FLASH RECOVERY AREA BACKUP DEST 2 32 BITS Nodes on the cluster reinstalled at the same time

11 Data headers 32 bits Red Hat Enterprise Linux ES release 4 Installation drivers for Storage Access IBM DS 3400 Installation by GCE group Preparing nodes for 64 bits deployment OS / Oracle binaries 3D Cluster Migration Intervention -OS 64 bits install- Storage Migrate OS to 64 bits on two nodes OS Node 2 / 64 bits OS Node 1 / 64 bits

12 Migration plan executed -Oracle 64 bits install- Install Clusterware and Oracle database -Upgrade it to version -Enable ASM instance (mount ASM disk groups) -Apply patches (CPU Jan 2008)

13 On node 1 Start database in restricted mode Oracle’s recommended steps 1. Shutdown instance node 2 2.Change the cluster specific parameter cluster_database=false 3. startup upgrade mode 4.Run the following scripts: ?/rdbms/admin/utlirp.sq ?/rdbms/admin/utlrp.sql SQL> shutdown immediate; 5. Change back cluster_database=true 6. STARTUP DATABASE AND ALL NODES 3D Cluster Migration Intervention -recompiling objects- Instance 1 ASM Instance1 OS Oracle Cluster Ready Services Node 1 / 64 bits DATABASE 32 BITS Data disk group 32 bits Backup Area 32 bits 1. Alternative Applying migration scripts directly, without restoring the database Instance 1 ASM Instance1 OS Oracle Cluster Ready Services Node 1 / 64 bits DATABASE 32 BITS Data disk group 64 bits Backup Area 32 bits BeforeAfter

14 Instance 1 ASM Instance1 OS Oracle Cluster Ready Services Node 1 / 64 bits DATABASE 32 BITS On node 1 Follow Oracle’s metalink document Note: After restoring the database and open it in restricted mode -Migrating to 64 bit ?/rdbms/admin/utlirp.sql ?/rdbms/admin/utlrp.sql SQL> shutdown immediate; SQL> startup 3D Cluster Migration Intervention Data disk group 64 bits Backup Area 32 bits 2. Alternative (in case of total data lost) Restoring database a 32 bit on new 64 bits cluster installation

15 3D Cluster Migration Intervention -the scripts used- UTILRP.sql First invalidates and then recompiles PL/SQL modules in the format required by the new database by: 1. Alters certain dictionary tables 2. Reloads STANDAR and DBMS_STANDAR necessary for using PL/SQL 3. Recompiles of all PL/SQL modules (procedures, functions, packages, types, triggers, views ) No other DDL on the database while running the script Primarily used for word size conversion

16 3D Cluster Migration Intervention -the scripts used- UTLRP.sql - Recompiles all invalid PL/SQL objects in the database. - Runs a component validation procedure for each component in the database. Oracle recommends to run it after, upgrades, downgrades and patches to minimize latencies cause by on demand recompilation. (objects are automatically re-validated when used) objects were recompiled on production

17 3D Cluster Migration Intervention Use Recovery Manager to search logical or physical data corruption after migration.

18 Migration plan executed -enable database services- Startup all instances and verify database is open Start stream apply process

19 Conclusion and comments Migration procedure at BNL 3D atlas conditions database was presented. Upgrading the 3D LCG Conditions database cluster to 64 bits will take advantage of the hardware resources. An alternative 1 direct migration procedure was applied - Production system 5500 objects 25 minutes Recovery Manager oracle tool did not find any logical or physical data corruption The two alternatives presented used the same migration procedure

20 Bibliography Oracle Database 10g Real Application Clusters Handbook, McGraw Hill Osborne Media; 1 edition (November 22, 2006) Oracle Database 10g RMAN Backup & Recovery (Paperback) McGraw-Hill Osborne Media; 1 edition (November 14, 2006) Online documentation DOCUMENTS Note: Oracle database concepts D Twiki documentation

21 Acknowledgment Special thanks to: BNL GCE RACF facility group Dr.Jason Smith Robert Petkus CERN ITD PSS group Dawid Wocjik Luca Canali Jacek Wojcieszuk Eva Dafonte Perez Dirk Duellmann Maria Girone Atlas DBAsPH/ATP-CO Group Gancho Dimitrov

22 Backup slides

23 Server Monitor (SMON) Oracle single instance manager DATAFILES Control Files Redo log Files Redo log Files Checkpoint (CKPT) Process Monitor (PMON) Database Writer (DBWn) LogWriter (LGWR) Archiver (ARCn) Archive log Files Shared pool Java pool Streams pool Database buffer cache Redo log buffer Large pool SYSTEM GLOBAL AREA (SGA) SERVER PROCESS PGA

24 Oracle cluster architecture SGA Redo log Files Redo log Files Database Writer (DBWn) LogWriter (LGWR) Database Writer (DBWn) LogWriter (LGWR) Redo log Files Redo log Files DATAFILES High Speed Interconnect GLOBAL CACHE SERVICE (GCS) GLOBAL CACHE SERVICE (GCS) Node1 Node 2 Cluster Manager

25 Instance 1 ASM Instance1 OS Oracle Cluster Ready Services Node 1 / 64 bits DATABASE 32 BITS On node 1 -Restore database using RMAN Restore control file Restore database -Migrating to 64 bit SQL> recover database until cancel using backup controlfile; SQL> alter database open resetlogs migrate; ?/rdbms/admin/utlirp.sql ?/rdbms/admin/utlrp.sql SQL> shutdown immediate; SQL> startup Migration plan another approach -recovering database- Data disk group 64 bits Backup Area 32 bits Restore database on new 64 bits cluster installation -Follow Oracle Metalink document Time: minutes

26 Migration plan -OS 64 bits installation, Alternative 2- Instance 1 ASM Instance1 OS Oracle Cluster Ready Services Node 1 / 32 bits DATABASE 32 BITS Virtual IP addresses Public Network Private High Speed Network Instance 2 ASM Instance2 OS Oracle Cluster Ready Services Node 2 / 32 bits DATA DISK GROUP 32 BITS DATA SIZE: 162 GB FLASH RECOVERY AREA BACKUP DEST 1 32 BITS FLASH RECOVERY AREA BACKUP DEST 2 32 BITS -SHUTDOWN INSTANCE1 (NO STREAM PROCESS RUNNING NODE 1) -VERIFY STATUS DATABASE OPEN AND RUNNING ON INSTANCE 1 and stream process is running -To remove NODE 1 from cluster Apply procedure presented last WLCG workshop by Jacek Wojcieszuk nId=7&resId=1&materialId=slides&confId=20080 Preparing first node for upgrade Remove one node 1 from cluster, database service open.

27 What if does not work? In case of loosing the entire data or data totally corrupted on migration process –Recover from second backup destination –Cluster will be taking out of production and data resyncronized separately from CERN. Then will be included on production. Time intervention: 1 to 2 days Installation OS/Oracle 64 bits fails Time intervention = 1 to 2 days –Cluster will be taking out of production and Oracle will be reinstalled and the data resyncronized separately from CERN. Then will be included on production. User impact: None, will get data from closest Tier 1 3D site