Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Oracle Cluster Ready Services 11g – tips and comments Boris Gyurov IT Knowledge Ltd.

Similar presentations


Presentation on theme: "1 Oracle Cluster Ready Services 11g – tips and comments Boris Gyurov IT Knowledge Ltd."— Presentation transcript:

1 1 Oracle Cluster Ready Services 11g – tips and comments Boris Gyurov IT Knowledge Ltd.

2 2 Why Oracle Cluster Ready Services? - Support for Linux Appeared initially to support Oracle Parallel Server 8.1.7 on Linux Looked as an exotic configuration at that time Benefits: - one can install Parallel Server on Linux

3 3 Why Oracle Cluster Ready Services? - Ment To Support RAC The lower prices and higher speed of the communication equipment gave Oracle's “share everything” architecture huge advantage – it started to scale well in 9i. The customers were still afraid of the complicated setup (vendor specific clusterware, raw devices to share the storage) and the high price (option of Oracle EE)‏ Oracle's answer Oracle Cluster Ready Services OCFS and ASM to share the storage RAC as a part of Oracle Server SE included in the price The results Tenths of installations all around Bulgaria Thousands of installations all around the world RAC becomes commodity

4 4 Why Oracle Cluster Ready Services? - Generic Code Generic code means generic bugs Generic bugs are easier and faster to find – no matter on which platform you run, you can hit it – that means more testers Generic bugs are easier to fix – one fix for all platforms Generic code is cheaper to support – only one team vs. many platform specific teams

5 5 Why Oracle Cluster Ready Services? - Lower Price No need to buy clusterware

6 6 Why Oracle Cluster Ready Services? - Single Support Resource No more ping-pong between the hardware (and clusterware) vendor and Oracle No more different experts to configure different parts It all comes by Oracle

7 7 What is Oracle Clusterware? Enables one system to be composed by many machines Enables one Service to be provided by many nodes Enables processes to be failed over to surviving node in case of failures Enables network interfaces to be failed over to surviving node Monitors all the resources and relocates them as needed Notifies the cluster members, client applications and all the subscribers for resource status changes Creates a base for cluster-enabled applications (such as RAC)‏ Enables cluster level resource startup/shutdown

8 8 Oracle Clusterware Hardware Concepts One or more (generally 2 or more) servers Inter-node communication media (most often high speed network)‏ Public network interface Shared storage resources

9 9 Oracle Clusterware Software Concepts - The Oracle Cluster Registry (OCR)‏ Contains the cluster configuration (the section SYSTEM)‏ Contains the Oracle Database and Services resource definitions (The section DATABASE)‏ Contains the Third Party resources definition (The CRS Section)‏ Ocrdump utility – dumps the OCR in text or XML format and lets us to browse its structure and contents

10 10 Oracle Clusterware Software Concepts - The Voting Disk The need of Voting Disk: In case of node interconnect failure, nodes cannot find out if the node is down or the IC is down. Hence each can decide that the other is down and try to recover the cluster. The cluster would split to sub-clusters – “brain split” The Voting disk – a file, shared between the nodes,at the shared storage where each node writes “heart beat” Ensures a second communication path between the nodes, to determine which one should go down and which will stay and recover Should be mirrored (at Oracle or OS level) to prevent corruption. With Voting disk unaccessible the cluster goes down

11 11 Oracle Clusterware Processes on Linux and UNIX Systems crsd—Performs high availability recovery and management operations such as maintaining the OCR and managing application resources. This process runs as LocalSystem. This process restarts automatically upon failure. evmd—Event manager daemon. This process also starts the racgevt process to manage FAN server callouts. ocssd—Manages cluster node membership and runs as the oracle user. Uses IC and the Voting disk; failure of this process results in a node restart.

12 12 Oracle Clusterware Processes on Linux and UNIX Systems oprocd—Process monitor for the cluster. Note that this process only appears on platforms that do not use third-party vendor clusterware with Oracle Clusterware.

13 13 Oracle Clusterware Processes on Linux- Processes startup From the Linux man pages DESCRIPTION The inittab file describes which processes are started at bootup and during normal operation...... An entry in the inittab file has the following format: id:runlevels:action:process....... Valid actions for the action field are: respawn The process will be restarted whenever it terminates (e.g. getty)......

14 14 Oracle Clusterware Processes on Linux- Processes startup [oracle@class01 bin]$ cat /etc/inittab...... # Run xdm in runlevel 5 x:5:respawn:/etc/X11/prefdm -nodaemon h1:35:respawn:/etc/init.d/init.evmd run >/dev/null 2>&1 </dev/null h2:35:respawn:/etc/init.d/init.cssd fatal >/dev/null 2>&1 </dev/null h3:35:respawn:/etc/init.d/init.crsd run >/dev/null 2>&1 </dev/null

15 15 Oracle Clusterware Processes startup on Windows Oracle Process Manager Daemon (OPMD)—OPMD is registered with the Windows Service Control Manager (WSCM) and the startup of all OracleClusterware services are dependent on OPMD. On system startup, and after the default time period of 60 seconds has elapsed, OPMD automatically starts all of the registered Oracle Clusterware services. This startup delay enables other services to start that are outside of the scope of Oracle control, such as storage access, anti-virus, or firewall services. You can set OPMD to start manually.However, this will delay the startup of the rest of the affected Oracle Clusterware

16 16 The RACG Infrastructure Takes care of the Oracle Specific Resources One racgimon process is spawned for each database or ASM instance to monitor its health [oracle@class01 ~]$ ps -ef|grep racg oracle 5822 1 0 11:31 ? 00:00:04 /u01/app/oracle/product/11.1/db_1/bin/racgimon startd racdb

17 17 The RACG Infrastructure CRSD also spawns other child processes to perform different actions (kill, start/stop resources, change configurations etc.)‏ Racgeut to kill timeoutet actions Usage racgeut [-e...=...] Racgmain to start/stop/check/manage resources Usage racgmain [resource name] start|stop|check racgmain startorp|failsrvsa dbname instname [srvname] racgmain startorp|failsrvsa nodename racgmain cond_resname cond_state func [args...] Racgvip (run as root) to check and relocate the VIP

18 18 The Virtual IP (VIP) Concept The VIP is an IP address, controlled by the CRS Should be from the public subnet Should be resolvable trough DNS or /etc/hosts Used by the RAC database to avoid TCP/IP timeouts when recognizing node or interface down events Used by the third party applications, to still be reached at the same IP, although moved to the surviving node in case of failover Should be used instead of the static public IP

19 19 Using CRS with Third Party APPS Overview An application profile should be added to the OCR. The main attributes are: Action Program – an executable to start/stop/check the application Privileges – which user can start/stop the application Resource – a resource name for your application

20 20 Using CRS with Third Party APPS Creating the profile [oracle@class01 ~]$ crs_profile -create apache_crs -t application -dir./ -a /root/apache_crs.sh -r ora.class01.vip [oracle@class01 ~]$ ll total 164 -rw-r--r-- 1 oracle oinstall 760 Aug 19 17:24 apache_crs.cap drwxr-xr-x 2 oracle oinstall 4096 Aug 13 18:58 Desktop -rw-r--r-- 1 oracle oinstall 43387 Aug 14 12:47 ocr_bef.dmp -rw-r--r-- 1 oracle oinstall 56929 Aug 15 16:19 OCRDUMPFILE [oracle@class01 ~]$

21 21 Using CRS with Third Party APPS Registering the Profile [oracle@class01 ~]$ crs_register apache_crs -dir./ [oracle@class01 ~]$ crs_stat |grep -A 5 apache NAME=apache_crs TYPE=application TARGET=OFFLINE STATE=OFFLINE NAME=ora.class01.LISTENER_CLASS01.lsnr

22 22 Using CRS with Third Party APPS Setting the Permitions Setting the owner [root@class01 oracle]# /u01/app/oracle/product/11.1/crs11/bin/crs_setperm apache_crs -o root Setting the rights [root@class01 oracle]# /u01/app/oracle/product/11.1/crs11/bin/crs_setperm apache_crs -u user:oracle:r-x

23 23 Using CRS with Third Party APPS Starting and Stopping the resource Checking the state [oracle@class01 ~]$ crs_stat -t Name Type Target State Host ------------------------------------------------------------ apache_crs application OFFLINE OFFLINE Starting the resource [oracle@class01 ~]$ crs_start apache_crs Attempting to start `apache_crs` on member `class01` Start of `apache_crs` on member `class01` succeeded. [oracle@class01 ~]$ crs_stat -t Name Type Target State Host ------------------------------------------------------------ apache_crs application ONLINE ONLINE class01

24 24 Using CRS with Third Party APPS Starting and Stopping the resource Stopping the resource [oracle@class01 ~]$ crs_stop apache_crs Attempting to stop `apache_crs` on member `class01` Stop of `apache_crs` on member `class01` succeeded. [oracle@class01 ~]$ crs_stat -t Name Type Target State Host ------------------------------------------------------------ apache_crs application OFFLINE OFFLINE

25 25 Using CRS with Third Party APPS Failover Step 1: Apache and the VIP running on node 1 [oracle@class02 ~]$ crs_stat -t Name Type Target State Host ------------------------------------------------------------ apache_crs application ONLINE ONLINE class01 ora....01.lsnr application ONLINE ONLINE class01 ora....s01.gsd application ONLINE ONLINE class01 ora....s01.ons application ONLINE ONLINE class01 ora....s01.vip application ONLINE ONLINE class01 Here we pull the power supply cable from the node 1

26 26 Using CRS with Third Party APPS Failover Step 2: Apache and the VIP goes offline [oracle@class02 ~]$ crs_stat -t Name Type Target State Host ------------------------------------------------------------ apache_crs application ONLINE OFFLINE ora....01.lsnr application ONLINE OFFLINE ora....s01.gsd application ONLINE OFFLINE ora....s01.ons application ONLINE OFFLINE ora....s01.vip application ONLINE OFFLINE

27 27 Using CRS with Third Party APPS Failover Step 3: Apache and the VIP goes on-line at node 2 [oracle@class02 ~]$ crs_stat -t Name Type Target State Host ------------------------------------------------------------ apache_crs application ONLINE ONLINE class02 ora....01.lsnr application ONLINE ONLINE class01 ora....s01.gsd application ONLINE ONLINE class01 ora....s01.ons application ONLINE ONLINE class01 ora....s01.vip application ONLINE ONLINE class02 NOTE: Customer should not change the IP it requests via the browser. Apache is still accessible at the VIP IP

28 28 Using CRS with Third Party APPS VIP Note Oracle does not recommend using same VIP for more applications. In our case we use the database VIP to operate with the APACHE as well. To complain with that we should create new VIP, dedicated for the APACHE server and use it instead of the database VIP. It would operate exactly the same as the database VIP but would be different

29 29 Using CRS with Third Party APPS Failover Step 4: Node 1 comes back. VIP goes back to node 1. Apache is still present at node 2. Apache is not reachable at the VIP at that moment [oracle@class02 ~]$ crs_stat -t Name Type Target State Host ------------------------------------------------------------ apache_crs application ONLINE ONLINE class02 ora....01.lsnr application ONLINE OFFLINE ora....s01.gsd application ONLINE OFFLINE ora....s01.ons application ONLINE OFFLINE ora....s01.vip application ONLINE ONLINE class01

30 30 Using CRS with Third Party APPS Failover Step 5: Apache also goes back to Node 1 since it is declared to be dependent on the node 1 VIP. It is reachable again [oracle@class02 ~]$ crs_stat -t Name Type Target State Host ------------------------------------------------------------ apache_crs application ONLINE ONLINE class01 ora....01.lsnr application ONLINE ONLINE class01 ora....s01.gsd application ONLINE ONLINE class01 ora....s01.ons application ONLINE ONLINE class01 ora....s01.vip application ONLINE ONLINE class01

31 31 Using CRS with Third Party APPS Using its own VIP Creating a new, application specific VIP [oracle@class01 ~]$ crs_profile -create apache_vip -dir./ -t application -a\ /u01/app/oracle/product/11.1/crs11/bin/usrvip \ -o oi=eth1,ov=192.168.16.110,on=255.255.255.0,ap=0 The ap (active placement) option tells the system not to reevaluate the resource placement in case of new node addition. Our VIP is not connected to particular node. It starts on any node on startup, fails over to any surviving node in case of failure and do not returns back in case if the original node starts again Setting permitions [root@class01 ~]#./crs_setperm apache_vip -o root [root@class01 ~]#./crs_setperm apache_vip -u user:oracle:r-x

32 32 Using CRS with Third Party APPS Using its own VIP Making apache_crs dependent on the new apache_vip. apache_crs is now dependent on ora.class01.vip. To change that [root@class02 oracle]#./crs_register apache_crs -update -r apache_vip Now apache_crs will follow apache_vip on every node. When apace_vip starts on a node, apache_crs will go at the same node When apache_vip fails over to ANY surviving node, apache_crs will fail over to the same node When the failed node starts up again, the apache_vip will not go back to it (active placement) and so will the apache_crs

33 33 Using CRS with Third Party APPS Using its own VIP – we got a Service No particular node. We never know where the application runs, but we always access it at the apache_vip We need to share binaries We need to share the configuration files We need to share everything the application needs to operate, so that each node can access it in the same directory tree And OCFS is here to help

34 34 The clusters and the Oracle Universal Installer OUI supports cluster level installations – installing CRS and Oracle Database on all the cluster nodes simultaneously Scripts provided under install_directory/install to: runSSHSetup.sh – to set user equivalecy addNode.sh – to add node to an existing cluster – calls OUI attachHome.sh/detachHome.sh to attach/detach existing homes from the Oracle Inventory Under install_directory the runcluvfy.sh to check all the prerequisites

35 35 The clusters and the Oracle Universal Installer The Oracle Inventory now cares about which cluster members contains particular home directories

36 36 [oracle@class01 ~]$ ls /u01/app/oraInventory/ContentsXML/ comps.xml inventory.xml libs.xml [oracle@class01 ~]$ cat /u01/app/oraInventory/ContentsXML/inventory.xml 11.1.0.6.0 2.1.0.6.0

37 37 The clusters and the Oracle Universal Installer – the command line options [oracle@class01 bin]$./runInstaller -help...... -clusterware oracle.crs, Version of Cluster ready services installed. -addNode For adding node(s) to the installation. Wraped by the addNode.sh -attachHome For attaching homes to the OUI inventory. Wrapped by attachHome.sh -detachHome For detaching homes from the OUI inventory without deleting inventory directory inside Oracle home.

38 38 The clusters and the Oracle Universal Installer – the command line options -updateNodeList For updating node list for this home in the OUI inventory. Particularly useful when removing node from the cluster -remoteshell Unix specific option. Used only for cluster installs, specifies the path to the remote shell program on the local cluster node. And may more

39 39 The Bottom Line (or what I like )‏ CRS looks good, reliable and mature since 10gR2 Now we have complete set of tools to change almost everything in the configuration Now we can multiplex the OCR and the Voting Disk for better reliability Now Oracle fully supports adding and removing nodes from the clustrer along with the utilities for that

40 40 The Bottom Line (or what I don't like )‏ There are many utilities for management, often duplicating the functionality It is still easy to mess it up (say mess up the private and the public IPs)‏ Although possible, reconfiguration (say fixing the problem with the messed up private and public IP) is still quite a pain. Lot of commands, often not very intuitive Some of the tasks (for example managing the inventory while adding and removing nodes ) have to be done by hand, typing commands, which are sort of “black magic” There is still what to be done in documenting CRS.

41 41 Q&A


Download ppt "1 Oracle Cluster Ready Services 11g – tips and comments Boris Gyurov IT Knowledge Ltd."

Similar presentations


Ads by Google