1 Oracle Cluster Ready Services 11g – tips and comments Boris Gyurov IT Knowledge Ltd.

Slides:



Advertisements
Similar presentations
SOM Sponsors: RAC, GRID, CLOUD OR ON THE WAY TO ORACLE CLOUD 11GR2 RAC FEATURES REVIEW By: Ahmed Baraka (Independent) Yury Velikanov (Pythian) & All of.
Advertisements

ITEC474 INTRODUCTION.
Intel® Manager for Lustre* Lustre Installation & Configuration
Introduction to DBA.
11 TROUBLESHOOTING Chapter 12. Chapter 12: TROUBLESHOOTING2 OVERVIEW  Determine whether a network communications problem is related to TCP/IP.  Understand.
1 Week #1 Objectives Review clients, servers, and Windows network models Differentiate among the editions of Server 2008 Discuss the new Windows Server.
1 © Copyright 2010 EMC Corporation. All rights reserved. EMC RecoverPoint/Cluster Enabler for Microsoft Failover Cluster.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 5: Planning, Configuring, And Troubleshooting DHCP.
Chapter 9 Chapter 9: Managing Groups, Folders, Files, and Object Security.
14.1 © 2004 Pearson Education, Inc. Exam Planning, Implementing, and Maintaining a Microsoft Windows Server 2003 Active Directory Infrastructure.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 12: Managing and Implementing Backups and Disaster Recovery.
Hands-On Microsoft Windows Server 2003 Networking Chapter 7 Windows Internet Naming Service.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 8: Implementing and Managing Printers.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 8: Implementing and Managing Printers.
2 Copyright © 2009, Oracle. All rights reserved. Installing your Oracle Software.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 8: Implementing and Managing Printers.
1© Copyright 2011 EMC Corporation. All rights reserved. EMC RECOVERPOINT/ CLUSTER ENABLER FOR MICROSOFT FAILOVER CLUSTER.
1 Chapter Overview Introduction to Windows XP Professional Printing Setting Up Network Printers Connecting to Network Printers Configuring Network Printers.
Implementing Failover Clustering with Hyper-V
11 SERVER CLUSTERING Chapter 6. Chapter 6: SERVER CLUSTERING2 OVERVIEW  List the types of server clusters.  Determine which type of cluster to use for.
Ronen Gabbay Microsoft Regional Director Yside / Hi-Tech College
VMware vCenter Server Module 4.
ABC Co. Network Implementation High reliability is primary concern – near 100% uptime required –Customer SLA has stiff penalty clauses –Everything is designed.
NovaBACKUP 10 xSP Technical Training By: Nathan Fouarge
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 12: Managing and Implementing Backups and Disaster Recovery.
High Availability & Oracle RAC 18 Aug 2005 John Sheaffer Platform Solution Specialist
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 14: Problem Recovery.
Linux Operations and Administration
Chapter 5 Roles and features. objectives Performing management tasks using the Server Manager console Understanding the Windows Server 2008 roles Understanding.

Module 13: Configuring Availability of Network Resources and Content.
September 18, 2002 Introduction to Windows 2000 Server Components Ryan Larson David Greer.
Oracle10g RAC Service Architecture Overview of Real Application Cluster Ready Services, Nodeapps, and User Defined Services.
Clustering  Types of Clustering. Objectives At the end of this module the student will understand the following tasks and concepts. What clustering is.
Copyright ®xSpring Pte Ltd, All rights reserved Versions DateVersionDescriptionAuthor May First version. Modified from Enterprise edition.NBL.
INSTALLING MICROSOFT EXCHANGE SERVER 2003 CLUSTERS AND FRONT-END AND BACK ‑ END SERVERS Chapter 4.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 12: Managing and Implementing Backups and Disaster Recovery.
5 Chapter Five Web Servers. 5 Chapter Objectives Learn about the Microsoft Personal Web Server Software Learn how to improve Web site performance Learn.
Chapter 8 Implementing Disaster Recovery and High Availability Hands-On Virtual Computing.
1 Apache. 2 Module - Apache ♦ Overview This module focuses on configuring and customizing Apache web server. Apache is a commonly used Hypertext Transfer.
Database-Driven Web Sites, Second Edition1 Chapter 5 WEB SERVERS.
Module 7: Resolving NetBIOS Names by Using Windows Internet Name Service (WINS)
FailSafe SGI’s High Availability Solution Mayank Vasa MTS, Linux FailSafe Gatekeeper
1 Week #10Business Continuity Backing Up Data Configuring Shadow Copies Providing Server and Service Availability.
Guide to MCSE , Second Edition, Enhanced1 The Windows XP Security Model User must logon with: Valid user ID Password User receives access token Access.
What is a port The Ports Collection is essentially a set of Makefiles, patches, and description files placed in /usr/ports. The port includes instructions.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
Mark E. Fuller Senior Principal Instructor Oracle University Oracle Corporation.
Oracle 10g Clusterware (CRS) Overview 18 Aug 2005 John Sheaffer Platform Solution Specialist
High Availability in DB2 Nishant Sinha
WEEK 11 – TOPOLOGIES, TCP/IP, SHARING & SECURITY IT1001- Personal Computer Hardware System & Operations.
Linux Operations and Administration
SPI NIGHTLIES Alex Hodgkins. SPI nightlies  Build and test various software projects each night  Provide a nightlies summary page that displays all.
Deploying Highly Available SQL Server in Windows Azure A Presentation and Demonstration by Microsoft Cluster MVP David Bermingham.
3 Copyright © 2006, Oracle. All rights reserved. Installation and Administration Basics.
6 Copyright © 2006, Oracle. All rights reserved. Oracle Clusterware.
9 Copyright © 2007, Oracle. All rights reserved. Oracle Clusterware Administration.
C Copyright © 2006, Oracle. All rights reserved. Oracle Secure Backup Additional Installation Topics.
Monitoring Dynamic IOC Installations Using the alive Record Dohn Arms Beamline Controls & Data Acquisition Group Advanced Photon Source.
Windows Server 2003 { First Steps and Administration} Benedikt Riedel MCSE + Messaging
MCSA Windows Server 2012 Pass Upgrading Your Skills to MCSA Windows Server 2012 Exam By The Help Of Exams4Sure Get Complete File From
1 Remote Installation Service Windows 2003 Server Prof. Abdul Hameed.
The Linux Operating System
Oracle Solaris Zones Study Purpose Only
DHCP, DNS, Client Connection, Assignment 1 1.3
Oracle10g RAC Service Architecture
Overview Multimedia: The Role of WINS in the Network Infrastructure
Azure Container Service
Presentation transcript:

1 Oracle Cluster Ready Services 11g – tips and comments Boris Gyurov IT Knowledge Ltd.

2 Why Oracle Cluster Ready Services? - Support for Linux Appeared initially to support Oracle Parallel Server on Linux Looked as an exotic configuration at that time Benefits: - one can install Parallel Server on Linux

3 Why Oracle Cluster Ready Services? - Ment To Support RAC The lower prices and higher speed of the communication equipment gave Oracle's “share everything” architecture huge advantage – it started to scale well in 9i. The customers were still afraid of the complicated setup (vendor specific clusterware, raw devices to share the storage) and the high price (option of Oracle EE)‏ Oracle's answer Oracle Cluster Ready Services OCFS and ASM to share the storage RAC as a part of Oracle Server SE included in the price The results Tenths of installations all around Bulgaria Thousands of installations all around the world RAC becomes commodity

4 Why Oracle Cluster Ready Services? - Generic Code Generic code means generic bugs Generic bugs are easier and faster to find – no matter on which platform you run, you can hit it – that means more testers Generic bugs are easier to fix – one fix for all platforms Generic code is cheaper to support – only one team vs. many platform specific teams

5 Why Oracle Cluster Ready Services? - Lower Price No need to buy clusterware

6 Why Oracle Cluster Ready Services? - Single Support Resource No more ping-pong between the hardware (and clusterware) vendor and Oracle No more different experts to configure different parts It all comes by Oracle

7 What is Oracle Clusterware? Enables one system to be composed by many machines Enables one Service to be provided by many nodes Enables processes to be failed over to surviving node in case of failures Enables network interfaces to be failed over to surviving node Monitors all the resources and relocates them as needed Notifies the cluster members, client applications and all the subscribers for resource status changes Creates a base for cluster-enabled applications (such as RAC)‏ Enables cluster level resource startup/shutdown

8 Oracle Clusterware Hardware Concepts One or more (generally 2 or more) servers Inter-node communication media (most often high speed network)‏ Public network interface Shared storage resources

9 Oracle Clusterware Software Concepts - The Oracle Cluster Registry (OCR)‏ Contains the cluster configuration (the section SYSTEM)‏ Contains the Oracle Database and Services resource definitions (The section DATABASE)‏ Contains the Third Party resources definition (The CRS Section)‏ Ocrdump utility – dumps the OCR in text or XML format and lets us to browse its structure and contents

10 Oracle Clusterware Software Concepts - The Voting Disk The need of Voting Disk: In case of node interconnect failure, nodes cannot find out if the node is down or the IC is down. Hence each can decide that the other is down and try to recover the cluster. The cluster would split to sub-clusters – “brain split” The Voting disk – a file, shared between the nodes,at the shared storage where each node writes “heart beat” Ensures a second communication path between the nodes, to determine which one should go down and which will stay and recover Should be mirrored (at Oracle or OS level) to prevent corruption. With Voting disk unaccessible the cluster goes down

11 Oracle Clusterware Processes on Linux and UNIX Systems crsd—Performs high availability recovery and management operations such as maintaining the OCR and managing application resources. This process runs as LocalSystem. This process restarts automatically upon failure. evmd—Event manager daemon. This process also starts the racgevt process to manage FAN server callouts. ocssd—Manages cluster node membership and runs as the oracle user. Uses IC and the Voting disk; failure of this process results in a node restart.

12 Oracle Clusterware Processes on Linux and UNIX Systems oprocd—Process monitor for the cluster. Note that this process only appears on platforms that do not use third-party vendor clusterware with Oracle Clusterware.

13 Oracle Clusterware Processes on Linux- Processes startup From the Linux man pages DESCRIPTION The inittab file describes which processes are started at bootup and during normal operation An entry in the inittab file has the following format: id:runlevels:action:process Valid actions for the action field are: respawn The process will be restarted whenever it terminates (e.g. getty)......

14 Oracle Clusterware Processes on Linux- Processes startup bin]$ cat /etc/inittab # Run xdm in runlevel 5 x:5:respawn:/etc/X11/prefdm -nodaemon h1:35:respawn:/etc/init.d/init.evmd run >/dev/null 2>&1 </dev/null h2:35:respawn:/etc/init.d/init.cssd fatal >/dev/null 2>&1 </dev/null h3:35:respawn:/etc/init.d/init.crsd run >/dev/null 2>&1 </dev/null

15 Oracle Clusterware Processes startup on Windows Oracle Process Manager Daemon (OPMD)—OPMD is registered with the Windows Service Control Manager (WSCM) and the startup of all OracleClusterware services are dependent on OPMD. On system startup, and after the default time period of 60 seconds has elapsed, OPMD automatically starts all of the registered Oracle Clusterware services. This startup delay enables other services to start that are outside of the scope of Oracle control, such as storage access, anti-virus, or firewall services. You can set OPMD to start manually.However, this will delay the startup of the rest of the affected Oracle Clusterware

16 The RACG Infrastructure Takes care of the Oracle Specific Resources One racgimon process is spawned for each database or ASM instance to monitor its health ~]$ ps -ef|grep racg oracle :31 ? 00:00:04 /u01/app/oracle/product/11.1/db_1/bin/racgimon startd racdb

17 The RACG Infrastructure CRSD also spawns other child processes to perform different actions (kill, start/stop resources, change configurations etc.)‏ Racgeut to kill timeoutet actions Usage racgeut [-e...=...] Racgmain to start/stop/check/manage resources Usage racgmain [resource name] start|stop|check racgmain startorp|failsrvsa dbname instname [srvname] racgmain startorp|failsrvsa nodename racgmain cond_resname cond_state func [args...] Racgvip (run as root) to check and relocate the VIP

18 The Virtual IP (VIP) Concept The VIP is an IP address, controlled by the CRS Should be from the public subnet Should be resolvable trough DNS or /etc/hosts Used by the RAC database to avoid TCP/IP timeouts when recognizing node or interface down events Used by the third party applications, to still be reached at the same IP, although moved to the surviving node in case of failover Should be used instead of the static public IP

19 Using CRS with Third Party APPS Overview An application profile should be added to the OCR. The main attributes are: Action Program – an executable to start/stop/check the application Privileges – which user can start/stop the application Resource – a resource name for your application

20 Using CRS with Third Party APPS Creating the profile ~]$ crs_profile -create apache_crs -t application -dir./ -a /root/apache_crs.sh -r ora.class01.vip ~]$ ll total 164 -rw-r--r-- 1 oracle oinstall 760 Aug 19 17:24 apache_crs.cap drwxr-xr-x 2 oracle oinstall 4096 Aug 13 18:58 Desktop -rw-r--r-- 1 oracle oinstall Aug 14 12:47 ocr_bef.dmp -rw-r--r-- 1 oracle oinstall Aug 15 16:19 OCRDUMPFILE ~]$

21 Using CRS with Third Party APPS Registering the Profile ~]$ crs_register apache_crs -dir./ ~]$ crs_stat |grep -A 5 apache NAME=apache_crs TYPE=application TARGET=OFFLINE STATE=OFFLINE NAME=ora.class01.LISTENER_CLASS01.lsnr

22 Using CRS with Third Party APPS Setting the Permitions Setting the owner oracle]# /u01/app/oracle/product/11.1/crs11/bin/crs_setperm apache_crs -o root Setting the rights oracle]# /u01/app/oracle/product/11.1/crs11/bin/crs_setperm apache_crs -u user:oracle:r-x

23 Using CRS with Third Party APPS Starting and Stopping the resource Checking the state ~]$ crs_stat -t Name Type Target State Host apache_crs application OFFLINE OFFLINE Starting the resource ~]$ crs_start apache_crs Attempting to start `apache_crs` on member `class01` Start of `apache_crs` on member `class01` succeeded. ~]$ crs_stat -t Name Type Target State Host apache_crs application ONLINE ONLINE class01

24 Using CRS with Third Party APPS Starting and Stopping the resource Stopping the resource ~]$ crs_stop apache_crs Attempting to stop `apache_crs` on member `class01` Stop of `apache_crs` on member `class01` succeeded. ~]$ crs_stat -t Name Type Target State Host apache_crs application OFFLINE OFFLINE

25 Using CRS with Third Party APPS Failover Step 1: Apache and the VIP running on node 1 ~]$ crs_stat -t Name Type Target State Host apache_crs application ONLINE ONLINE class01 ora lsnr application ONLINE ONLINE class01 ora....s01.gsd application ONLINE ONLINE class01 ora....s01.ons application ONLINE ONLINE class01 ora....s01.vip application ONLINE ONLINE class01 Here we pull the power supply cable from the node 1

26 Using CRS with Third Party APPS Failover Step 2: Apache and the VIP goes offline ~]$ crs_stat -t Name Type Target State Host apache_crs application ONLINE OFFLINE ora lsnr application ONLINE OFFLINE ora....s01.gsd application ONLINE OFFLINE ora....s01.ons application ONLINE OFFLINE ora....s01.vip application ONLINE OFFLINE

27 Using CRS with Third Party APPS Failover Step 3: Apache and the VIP goes on-line at node 2 ~]$ crs_stat -t Name Type Target State Host apache_crs application ONLINE ONLINE class02 ora lsnr application ONLINE ONLINE class01 ora....s01.gsd application ONLINE ONLINE class01 ora....s01.ons application ONLINE ONLINE class01 ora....s01.vip application ONLINE ONLINE class02 NOTE: Customer should not change the IP it requests via the browser. Apache is still accessible at the VIP IP

28 Using CRS with Third Party APPS VIP Note Oracle does not recommend using same VIP for more applications. In our case we use the database VIP to operate with the APACHE as well. To complain with that we should create new VIP, dedicated for the APACHE server and use it instead of the database VIP. It would operate exactly the same as the database VIP but would be different

29 Using CRS with Third Party APPS Failover Step 4: Node 1 comes back. VIP goes back to node 1. Apache is still present at node 2. Apache is not reachable at the VIP at that moment ~]$ crs_stat -t Name Type Target State Host apache_crs application ONLINE ONLINE class02 ora lsnr application ONLINE OFFLINE ora....s01.gsd application ONLINE OFFLINE ora....s01.ons application ONLINE OFFLINE ora....s01.vip application ONLINE ONLINE class01

30 Using CRS with Third Party APPS Failover Step 5: Apache also goes back to Node 1 since it is declared to be dependent on the node 1 VIP. It is reachable again ~]$ crs_stat -t Name Type Target State Host apache_crs application ONLINE ONLINE class01 ora lsnr application ONLINE ONLINE class01 ora....s01.gsd application ONLINE ONLINE class01 ora....s01.ons application ONLINE ONLINE class01 ora....s01.vip application ONLINE ONLINE class01

31 Using CRS with Third Party APPS Using its own VIP Creating a new, application specific VIP ~]$ crs_profile -create apache_vip -dir./ -t application -a\ /u01/app/oracle/product/11.1/crs11/bin/usrvip \ -o oi=eth1,ov= ,on= ,ap=0 The ap (active placement) option tells the system not to reevaluate the resource placement in case of new node addition. Our VIP is not connected to particular node. It starts on any node on startup, fails over to any surviving node in case of failure and do not returns back in case if the original node starts again Setting permitions ~]#./crs_setperm apache_vip -o root ~]#./crs_setperm apache_vip -u user:oracle:r-x

32 Using CRS with Third Party APPS Using its own VIP Making apache_crs dependent on the new apache_vip. apache_crs is now dependent on ora.class01.vip. To change that oracle]#./crs_register apache_crs -update -r apache_vip Now apache_crs will follow apache_vip on every node. When apace_vip starts on a node, apache_crs will go at the same node When apache_vip fails over to ANY surviving node, apache_crs will fail over to the same node When the failed node starts up again, the apache_vip will not go back to it (active placement) and so will the apache_crs

33 Using CRS with Third Party APPS Using its own VIP – we got a Service No particular node. We never know where the application runs, but we always access it at the apache_vip We need to share binaries We need to share the configuration files We need to share everything the application needs to operate, so that each node can access it in the same directory tree And OCFS is here to help

34 The clusters and the Oracle Universal Installer OUI supports cluster level installations – installing CRS and Oracle Database on all the cluster nodes simultaneously Scripts provided under install_directory/install to: runSSHSetup.sh – to set user equivalecy addNode.sh – to add node to an existing cluster – calls OUI attachHome.sh/detachHome.sh to attach/detach existing homes from the Oracle Inventory Under install_directory the runcluvfy.sh to check all the prerequisites

35 The clusters and the Oracle Universal Installer The Oracle Inventory now cares about which cluster members contains particular home directories

36 ~]$ ls /u01/app/oraInventory/ContentsXML/ comps.xml inventory.xml libs.xml ~]$ cat /u01/app/oraInventory/ContentsXML/inventory.xml

37 The clusters and the Oracle Universal Installer – the command line options bin]$./runInstaller -help clusterware oracle.crs, Version of Cluster ready services installed. -addNode For adding node(s) to the installation. Wraped by the addNode.sh -attachHome For attaching homes to the OUI inventory. Wrapped by attachHome.sh -detachHome For detaching homes from the OUI inventory without deleting inventory directory inside Oracle home.

38 The clusters and the Oracle Universal Installer – the command line options -updateNodeList For updating node list for this home in the OUI inventory. Particularly useful when removing node from the cluster -remoteshell Unix specific option. Used only for cluster installs, specifies the path to the remote shell program on the local cluster node. And may more

39 The Bottom Line (or what I like )‏ CRS looks good, reliable and mature since 10gR2 Now we have complete set of tools to change almost everything in the configuration Now we can multiplex the OCR and the Voting Disk for better reliability Now Oracle fully supports adding and removing nodes from the clustrer along with the utilities for that

40 The Bottom Line (or what I don't like )‏ There are many utilities for management, often duplicating the functionality It is still easy to mess it up (say mess up the private and the public IPs)‏ Although possible, reconfiguration (say fixing the problem with the messed up private and public IP) is still quite a pain. Lot of commands, often not very intuitive Some of the tasks (for example managing the inventory while adding and removing nodes ) have to be done by hand, typing commands, which are sort of “black magic” There is still what to be done in documenting CRS.

41 Q&A