Overview of Real Application Clustering Features and Functionality

Slides:



Advertisements
Similar presentations
Tableau Software Australia
Advertisements

ITEC474 INTRODUCTION.
Distributed Processing, Client/Server and Clusters
Database Architectures and the Web
2 Copyright © 2005, Oracle. All rights reserved. Installing the Oracle Database Software.
INTRODUCTION TO ORACLE Lynnwood Brown System Managers LLC Oracle High Availability Solutions RAC and Standby Database Copyright System Managers LLC 2008.
Oracle9i Database Administrator: Implementation and Administration 1 Chapter 2 Overview of Database Administrator (DBA) Tools.
Introduction to DBA.
High Availability Group 08: Võ Đức Vĩnh Nguyễn Quang Vũ
High Availability 24 hours a day, 7 days a week, 365 days a year… Vik Nagjee Product Manager, Core Technologies InterSystems Corporation.
NETWORK LOAD BALANCING NLB.  Network Load Balancing (NLB) is a Clustering Technology.  Windows Based. (windows server).  To scale performance, Network.
Merrill Holt Director Parallel Server Product Management Oracle Corporation.
Distributed Processing, Client/Server, and Clusters
1 © Copyright 2010 EMC Corporation. All rights reserved. EMC RecoverPoint/Cluster Enabler for Microsoft Failover Cluster.
Network Operating Systems Users are aware of multiplicity of machines. Access to resources of various machines is done explicitly by: –Logging into the.
1 - Oracle Server Architecture Overview
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 17 Client-Server Processing, Parallel Database Processing,
Lesson 1: Configuring Network Load Balancing
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
PRASHANTHI NARAYAN NETTEM.
1© Copyright 2011 EMC Corporation. All rights reserved. EMC RECOVERPOINT/ CLUSTER ENABLER FOR MICROSOFT FAILOVER CLUSTER.
Module 14: Scalability and High Availability. Overview Key high availability features available in Oracle and SQL Server Key scalability features available.
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
Server Load Balancing. Introduction Why is load balancing of servers needed? If there is only one web server responding to all the incoming HTTP requests.
Oracle on Windows Server Introduction to Oracle10g on Microsoft Windows Server.
Oracle10g RAC Service Architecture Overview of Real Application Cluster Ready Services, Nodeapps, and User Defined Services.
INSTALLING MICROSOFT EXCHANGE SERVER 2003 CLUSTERS AND FRONT-END AND BACK ‑ END SERVERS Chapter 4.
Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.
By Lecturer / Aisha Dawood 1.  You can control the number of dispatcher processes in the instance. Unlike the number of shared servers, the number of.
Rensselaer Polytechnic Institute CSCI-4210 – Operating Systems CSCI-6140 – Computer Operating Systems David Goldschmidt, Ph.D.
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
Achieving Scalability, Performance and Availability on Linux with Oracle 9iR2-RAC Grant McAlister Senior Database Engineer Amazon.com Paper
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Process Architecture Process Architecture - A portion of a program that can run independently of and concurrently with other portions of the program. Some.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
Mark E. Fuller Senior Principal Instructor Oracle University Oracle Corporation.
Donna C. Hamby Sr. Principal Instructor Oracle University Oracle Corporation.
Oracle Cache Fusion Cache Fusion Concepts, Data Block Shipping, and Recovery with Cache Fusion.
Ashish Prabhu Douglas Utzig High Availability Systems Group Server Technologies Oracle Corporation.
Oracle Database Architecture By Ayesha Manzer. Automatic Storage Management Spreads database data across all disks Creates and maintains a storage grid.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 12: Planning and Implementing Server Availability and Scalability.
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
High Availability 24 hours a day, 7 days a week, 365 days a year…
Distributed Cache Technology in Cloud Computing and its Application in the GIS Software Wang Qi Zhu Yitong Peng Cheng
Database Architectures and the Web
Netscape Application Server
Client/Server Databases and the Oracle 10g Relational Database
Network Load Balancing
Maximum Availability Architecture Enterprise Technology Centre.
Distribution and components
The Client/Server Database Environment
Introduction to Operating System (OS)
Database Architectures and the Web
Introduction to Networks
Oracle Solaris Zones Study Purpose Only
Introduction of Week 6 Assignment Discussion
Chapter 16: Distributed System Structures
Distributed System Structures 16: Distributed Structures
Parallel and Multiprocessor Architectures – Shared Memory
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
Specialized Cloud Mechanisms
Fault Tolerance Distributed Web-based Systems
Oracle10g RAC Service Architecture
Specialized Cloud Architectures
Chapter 2: Operating-System Structures
Database System Architectures
Chapter 2: Operating-System Structures
Presentation transcript:

Overview of Real Application Clustering Features and Functionality Oracle RAC Overview of Real Application Clustering Features and Functionality

Overview What is RAC? Cache Fusion Failover and Load-balancing Transparent Application Failover (TAF) Other RAC Features

RAC – What is it? Multiple instances of Oracle running on up to 8 nodes Multiple instances share a single physical database All instances can simultaneously execute transactions against the single database Caches are synchronized using Oracle’s Global Cache Management technology (Cache Fusion) Oracle RAC allows multiple nodes to run multiple instances of Oracle while accessing a single physical database. On Linux, the maximum number of nodes supported in a certified configuration is eight.

History of Oracle RAC Previous Oracle Clustering Products Oracle FailSafe on Windows OPS (Oracle Parallel Server) on multiple platforms OPS to RAC: 7.3 OPS  8i OPS  9i RAC  10g RAC The clustering mechanism used to be more dependent on the Operating System. With 10g RAC, clustered database is built into Oracle Oracle has had clustering functionality in their product for a long time. Previously, the clustering technology was more dependent on the Operating System. Oracle chose to incorporate more of the the clustering functionality into the Oracle database rather than rely on how the OS does clustering. Oracle 9i and 10g RAC have more of the key clustering capabilities within Oracle itself. This means that the clustering is done differently than it was with OPS. From the Oracle 9i RAC whitepaper off of the Oracle website (see http://otn.oracle.com/products/oracle9i/pdf/appclusters_cache.pdf): “A cluster database lays on top of a hardware cluster. The hardware cluster architecture and the database cluster architecture may be the same; or may differ.”

Oracle RAC Features Full Cache Fusion Enhanced coordination of cache management and distributed lock manager (DLM) Lock simplification and automation Global Cache Service coordinates local buffer cache and remote block transfers Enhanced IPC Resource configuration simplification and automation Improved cluster aware tools Enhanced DBCA Oracle Enterprise Manager and Grid Control Integration Enhancements Improved cluster aware tools: Oracle Universal Installer (OUI) Enterprise Manager (EM) Database Configuration Assistant (DBCA) Net Assistant (NetCA) Recovery Manager (RMAN) Command line interface (SRVCTL) The database configuration assistant (DBCA) has been enhanced for clustered operation for the fundamental operations: Create cluster database Create the server initialization file (SPFILE) Centralized, persistent Real Applications Clusters configuration storage Eliminates consistency problems with per node text file-based configuration files Add and delete instances Oracle Enterprise Manager has been enhanced at both the database level and at the drill down level for individual instances. These are some of the enhancements for the database as a whole. Report Generation: Generate/View reports for targets of type cluster database and cluster database instance Redo log assignment: Assign redo log groups to specific threads Wizards/Tools: Full support cluster support SPFILE Handling: View and update server side initialization parameter file Details of in-doubt transactions At the drill down level for each instance, Oracle Enterprise Manger provides detail information. Session Handling: List status of connected users, view latest SQL for the session, kill a session Lock Details: SQL DML enqueues, transaction enqueues and row level locks Resource Monitoring: Performance statistics of active resource plans For the cluster database: create/modify resource consumer groups and define/modify/activate resource plans

RAC uses “Shared Everything” Users Server Server Server Server Real Application Clusters are designed as a “Shared Everything” architecture. This means that both server resources and storage resources are shared, with a single database image. This is beneficial for scalability and flexibility. However, the shared nature of all of the resources (particularly storage) make it more important than ever to institute a High Availability architecture, to avoid single points of failure. Database

How RAC clustering is done One set of data All nodes in the cluster see the same set of data All nodes have access Any node can update the data Oracle RAC is “Shared Everything” clustering. This allows all nodes in the cluster to have access to the data at the same time. There is only one set of data that all of the nodes can use at all times.

Increased Manageability One virtual system to configure and manage Single Oracle Database Single management console Single system image for the database integrated with the cluster Cluster-wide monitoring and diagnostics Oracle Enterprise Manager Integration (9i) Oracle DBConsole and Grid Control Integration (10g) According to Oracle Corporation, the ultimate goal for Real Application Clusters is to provide manageability that is comparable to a single computer with a single instance of the Oracle database. In other words, for the common management tasks, the goal is to have the system look and behave like a single system.

What’s shared; What’s not Disk access Resources that manage data All instances have common data & controls files Not Shared Each node has its own dedicated: System memory Operating system Database instance Application software Each instance has individual Log files and Rollback segments Each node has it’s own dedicated system memory as well as its own operating system, database instance and application software

RAC can perform Load-balancing Failover Because the number of nodes in the cluster can grow as needed, and because they all have access to all of the same data, RAC can be used for both load-balancing and for failover: Load-balancing If you have the same application loaded on all of the nodes in the cluster, users can be distributed across all of the nodes. (In either a round-robin fashion or through distribution.) In either case, any user connecting through any node with see the same data. Failover Oracle RAC can also be used for failover. Since any 2 nodes will be configured exactly the same and have access to the same set of data, if one serer in the cluster goes down, the users can be transferred to another node in the cluster and the replacement node(s) will take over the duties of the failed server.

Load-Balancing through the Listener Client Listener Listener Listener Listener Listener The listener is a process that resides on the cluster server nodes. This process listens for incoming client connection requests and manages the traffic to the server. The listener brokers the client request, handing off the request to the server. Every time a client (or server acting as a client) requests a network session with a server, a listener receives the actual request. If the client's information matches the listener's information, then the listener grants a connection to the server. Node 1 Node 2 Node 3 Node 4 Database

How workload is balanced Nodes report CPU usage to listeners Node 1 Client Database Databases register with the listeners when started. Nodes in the cluster report their CPU usage to the registered listeners (pmon). Node 2

How workload is balanced Listeners choose least busy node when request comes in from client Node 1 Client Database Then when a request comes in from a client, the Listener can assign the client to the least busy server. Node 2

Load-Balancing Users Node 1 Node 2 Node 3 Node 4 Each node in the cluster has a different physical internet protocol address. However, users (or clients) connect to the database via a virtual database service name. Oracle automatically balances the user load among the multiple nodes in the cluster. The RAC database instances on the different nodes subscribe to all or some subset of database services. This allows you to choose whether specific application clients that connect to a particular database service can connect to some or all of the database nodes. If more nodes are added to the cluster, the CPU and memory resources of the new node are immediately made available to the rest of the cluster. (Data does not have to be re-partitioned.) This allows you to add nodes as you need them. Database

Failover If a node in the shared disk cluster fails, the system dynamically redistributes the workload among the surviving cluster nodes. RAC checks to detect node and network failures. A disk-based heartbeat mechanism uses the control file to monitor node membership and the cluster interconnect is regularly checked to determine correct operation. Reduced time to recovery with concurrent resource configuration and instance (cache) recovery Enhanced failover reliability in 10g with the use of Virtual IP addresses (VIPs) RAC specific enhancements include improvements that dramatically reduce the time to recover. These improvements include increased parallelism and reduced work by smarter algorithms. The Global Cache Service now only does the minimal amount of work needed to recover from node exits and joins to the cluster. The Global Resource Directory due to tight integration with the server processes is maintained optimally.

X Failover Users Server Server Server Server If a node in the shared disk cluster fails, the system dynamically redistributes the workload among the surviving cluster nodes. Database

Transparent Application Failover Masks failures to end users; they don’t need to log back into the system Applications and users are transparently reconnected to another node Applications and queries continue uninterrupted Transactions can failover and replay Login context maintained DML transactions are rolled back Transparent Application Failover Real Application Clusters provide near-continuous availability by hiding failures from end-user clients and application server clients. Transparent Application Failover in the database transparently re-routes application (query) clients to an available database node in the cluster when the connected node fails. Application clients do not see error messages describing loss of service. Failures are also hidden from update clients, in a similar fashion, by way of a simple application coding technique. The failover routine calls the appropriate client library function to re-route the connection. Furthermore, you can configure database clients to pre-connect, or to have redundant idle connections. These redundant connections with another database node avoid delays if thousands of users must migrate servers during a node failure.

RAC Improvements for Oracle 9i Full Cache Fusion Enhanced coordination of cache management and distributed lock manager (DLM) Lock simplification and automation Global Cache Service coordinates local buffer cache and remote block transfers Enhanced IPC (InterProcess Communication) Resource configuration simplification and automation Oracle 9i RAC brought several improvements in scalability over Oracle Parallel Server in the following areas: Full Cache Fusion Enhanced coordination of cache management and distributed lock manager (DLM) Lock simplification and automation Global Cache Service coordinates local buffer cache and remote block transfers Enhanced IPC Resource configuration simplification and automation

Oracle 10g RAC New Features Integrated Clusterware Management No third-party clusterware software required Automatic Workload Management Application workloads can be managed through named services Single System Image Management Enterprise Manager manages RAC instances as a single image Fast Connection Failover Fast recovery between the database and mid-tier applications Performance Improvements Reduced message traffic, memory usage, and other resources Zero Downtime Patching Patches may be applied one node at a time without downtime Cluster Verification and Improved Diagnostic Tools New cluster diagnostic tool and improved diagnostic tools Oracle 10 g RAC features a number of new features and improvements to existing features. These new features for Oracle 10g RAC are additive to the improvements introduced in Oracle 9i

Full Cache Fusion Is a major feature of RAC starting with 9i The underlying technology that enables RAC Protocol that allows instances to combine their data caches into a shared global cache Allows any node to get the most up-to-date data information from the cache of any other node in the cluster without having to access the disk drives again. Improved performance with 10g

What is Cache Fusion? When do I care about it? “Dirty” block of data is created Data from disk is read into memory on a node Data is updated on that node Data hasn’t been written to disk yet. Another node requests the data

“ABC” block of data written to the disk drives in the database Node A Node B ABC Data

“ABC” block of data read into memory on Node A Node B ABC Data ABC Data

“ABC” updated to “XYZ” in cache Node A Node B ABC Data XYZ Data ABC Data

Node B requests data block I want data! Gimme! Gimme! Node A Node B ABC Data XYZ Data ABC Data

Node A must write data block to disk drive I want data! Gimme! Gimme! Node A Node B ABC Data XYZ Data Previous to 9i RAC write ABC Data XYZ Data

Node B must read data block from disk drive Node A Node B ABC Data XYZ Data XYZ Data Previous to 9i RAC Normally, when a node (that does not already have the data in memory) requests a data block, the node that does have the data (and thus has a lock on the data block) must write that data to disk and then the other nodes can access the same data block. This uses disk I/O to keep the data synchronized across multiple nodes. That means that the data has to be physically written to a drive which involves mechanical moving components, and therefore, is inherently slower than passing data from memory. It also means that the various nodes must communicate regarding lock status. read ABC Data XYZ Data

Now with RAC Cache Fusion I want data! Gimme! Gimme! Node A Node B ABC Data XYZ Data XYZ Data Data is transferred immediately via the interconnect Shared cache minimizes slow I/O Cache Fusion in Oracle RAC allows immediate transfer of information from one node’s cache to another node without having to write to disk first. ABC Data

Shared Cache Across Nodes Users Cache Cache Cache Cache Server Server Server Server Cache Fusion uses the collective caches of all the nodes in the cluster to satisfy database requests. Requests for a data block can now be satisfied by the local cache or any of the other caches (instead of having to go to the disk drive). Expensive disk I/Os are only performed when none of the collective caches contain the necessary data and when an update transaction performs a COMMIT operation that requires disk write guarantees. Database

Resource Simplification and Automation No init.ora parameters required Resource affinity to move the location of the resource masters for a database file to the instance where block operations are most frequently occurring. This optimizes Dynamic resource remastering Ability to move the ownership of a resource between instances of Real Application Clusters. Dynamic resource remastering is used to implement resource affinity for increased performance. No init.ora In previous Oracle Parallel Server releases, there were a number of sometimes difficult parameters that needed to be set in the init.ora file. In particular the GC_FILES_TO_LOCKS parameter was difficult to understand and correctly set. This was carried forward from Oracle7 when the DLM was external to the database. In Oracle 9i and 10g the DLM no longer exits; it has been integrated with the buffer cache manager and is now the Global Cache Service. With this integration there is no longer a need for configuration parameters and the memory taken up by resources in the Global Resource Directory is greatly reduced as compared to earlier releases. Resource Affinity and Dynamic Resource remastering Resource affinity optimizes the system in situations where update transactions are being executed on one instance. When activity shifts to another instance the resource affinity will correspondingly move to the new instance. If activity is not localized, then the resource ownership is hashed to the instances. Oracle 10g offers performance improvements in dynamic file and cache affinity.

Review What does cache fusion avoid that was mandatory in previous versions of Oracle Parallel Server? Which Oracle process is most important in managing user session failover? If the purpose of the interconnect is NOT to serve as a “heartbeat”, where is the heartbeat?

Summary New Features Shared Everything Clustering Cache Fusion RAC Clustering failover & load-balancing