Oracle Maximum Availability Architecture

Oracle Maximum Availability Architecture
MAA Cloud Architectures and Life Cycle Best Practices Lawrence To, Senior Director MAA

Topics Covered 1 MAA Evolution: On-Premise to Cloud Cloud MAA Goals Cloud MAA Architectures (what’s doable today) Cloud MAA Configuration Best Practices (what’s there today) Cloud MAA Life Cycle Operations Cloud MAA with ATP 2 3 4 5 6 Confidential – Oracle Internal/Restricted/Highly Restricted

MAA Evolution: On-Premises to Cloud
On-Premises Exadata and Recovery Appliance Database / Exadata Cloud Autonomous Database Adding MAA Config and Life Cycle Operations, Shifting admin ownership to Oracle with MAA SLAs MAA integrated Engineered Systems (config practices, exachk, lowest brownouts, HA QoS, data protection) MAA Blueprints and Best Practices Confidential – Oracle Internal/Restricted/Highly Restricted

Cloud MAA Goals Provides best HA and DR Solutions and Service Levels in the cloud MAA Reference Architectures to meet SLAs Configuration Best Practices for Stability and Reliability MAA Life Cycle Operations integrated with Cloud APIs and Console ATP provides fully managed MAA solution with guaranteed service SLAs % Service Uptime with No Exclusions Exadata MAA + Validated Cloud Infrastructure + MAA Cloud Life Cycle Operations + Cloud and Product Enhancements and Fixes Confidential – Oracle Internal/Restricted/Highly Restricted

Topics Covered 1 MAA Evolution: On-Premise to Cloud Cloud MAA Goals Cloud MAA Architectures Cloud MAA Configuration Best Practices (what’s there today) Cloud MAA Life Cycle Operations Cloud MAA with ATP 2 3 4 5 6 Confidential – Oracle Internal/Restricted/Highly Restricted

MAA Reference Architectures for the Cloud
Availability Service Levels PLATINUM GOLD Gold + Zero Downtime Maintenance / Migration Mission Critical Gold + (Future) Advanced Capabilities for Zero Application Outages and Zero Data Loss Zero Downtime Golden Gate Cloud Svc. SILVER Business Critical BRONZE Last year: Only single instance DBaaS and Database Backup available Now: We have a solution for every tier and flavor of MAA Multiple data centers, RAC, Data Guard, GoldenGate Same architecture and capabilities for on-premises and cloud Strongest public cloud HA portfolio for Oracle environments Bronze : Local backups or backups to cloud. Low cost with higher RTO Silver : Backups + RAC for database instance availability to protect from instance failures. Optionally a customer can configure a local Data Guard copy for HA Gold : Backups + RAC + Data Guard / Active Data Guard for remote replication for DR purposes Platinum: Backups + RAC + Active Data Guard (with additional use cases) + Golden Gate. It provides zero data loss RPO capability across any distance along with application continuity capability. Prod/Departmental Zero Data Loss DR to the Cloud Use Case Silver + Zero Data Loss HA and DR Silver + Remote DB Replication With Active Data Guard Dev, Test, Prod Zero Downtime RAC Bronze + Zero Downtime High Availability Bronze + Database HA with RAC or DG FSFO Zero Data Loss Backup to the Cloud Use Case Single Instance DB Restartable Backup/Restore Backup and Recovery Confidential – Oracle Internal/Restricted/Highly Restricted

MAA Architecture Building Blocks
What’s available where? Cloud Infrastructure Backup/Restore Options RAC ADG ADs/Regions OCI (BM) Backup to OCI Object Storage (manual/automatic) Backup copies across ADs ✓ Across ADs Across Regions via VCN peering or Public Internet OCI (VM with SI or RAC) Exa-OCI (X6/X7) OCI-C (VM) Backup to OCI-C Object Storage Tiering Containers to Archive Storage Backup copies and geo-replication option Across ADs and regions where available Exa-OCI-C (X5,X6 and X7) OCC Backup to NFS, Local Object Storage, ZDLRA on premise, Cloud Object Storage with tiering ExaCC (X6, X7) ExaOCI and ExaCS has ¼, ½ and full Exadata racks RAC OCI is really DB < 15 TB and moderate IOPS (oCPU of 1 to 24) For ExaCC, single control plan to support Data Guard. However we need individual OCC support per DG. For backup, you have object storage or archive storage (min 90 days in some cases) Archive storage is cheaper but the data is not readily available for restore immediately. Confidential – Oracle Internal/Restricted/Highly Restricted

MAA Deployment Automation in the Cloud
MAA Database Deployment Made Easy Simple UI / CLI / REST interfaces to deploy your preferred topologies Databases are provisioned with optimal parameter configurations MAA made easy in the Cloud Public Cloud Classic (or) Cloud at Customer Primary Region #1 Standby Region #2 GOLD (DR) Primary AD #1 Standby AD #2 SILVER (HA) BRONZE Single Instance DB Backup Service RAC SILVER (HA) Confidential – Oracle Internal/Restricted/Highly Restricted

Bronze : Single Instance Database with Backups
Replicated Backups Low Cost MAA Solution for customers that can tolerate higher RTO and RPO Primary Datacenter Bronze Summary Single instance database with backups & auto-restart capabilities with full clusterware (OCI only) Optional replication of backup data to remote site Restore from backup to resume service following unrecoverable outages Features Oracle Restart Capabilities (OCI only) Multitenant Database (all 12c+ DBs in cloud) Online Maintenance (manual*) Corruption Protection (enabled on OCI) Flashback Technologies (enabled on OCI) Recovery Manager and Cloud Storage (all Oracle cloud) Recovery Appliance (occ/exacc only) Single Instance Database Database Files Cloud Backup Remote Datacenter Confidential – Oracle Internal/Restricted/Highly Restricted

Unplanned Outages and Planned Maintenance
Bronze - Single Instance Oracle Database Events Downtime (RTO) Data Loss Potential (RPO) Database instance failure Minutes Zero Recoverable server failure Minutes to hour Data corruptions, unrecoverable server failure, database or site failures Hours to days Since last backup or Near Zero with RA Online file move, reorganization/redefinition, and certain patches Hardware or operating system maintenance and database patches that cannot be done online Minutes to hours Database upgrades: patch sets and full database releases Platform migrations Hours to a day Application upgrades that modify back-end database objects Unplanned Outages This is a break down of what you can achieve with Bronze. As you can see, NZDL is plausible for the most failures. Downtime is minimum for most failures and planned maintenance activities. Online File Move, Online Reorganization and Redefinition, Online Patching Attractive for most database. Economical. Planned Maintenance Confidential – Oracle Internal/Restricted/Highly Restricted

Cloud Bronze MAA Next Steps
Configuration Best Practices MAA Audits and Discussions are ongoing. Most config practices are integrated. Customer Action: use clusterware managed services, adjust hugepages, ensure ORL/SRL setup Cloud Backup/Restore to Cloud Storage best practices Updating Cloud Backup APIs (sectionsize=64GB, parallelism, compression for non-HCC) White paper with DB performance examples/observations and ZDLRA integration Evolve Multitenant MAA best practices PDB Relocate, PDB Failover Confidential – Oracle Internal/Restricted/Highly Restricted

Silver Option 1: High Availability with Fast Failover
RTO of Seconds for Server Failures, RPO near Zero with ZDLRA Active-Active clustering with Oracle RAC All nodes active at all times Real-time failover Zero downtime rolling maintenance across RAC instances Hardware and OS maintenance Qualified Oracle Database patches Available in OCI (VM and Exadata), OCI-C and ExaCC DB Backup Service Replicated Backups ExaCS/ExaCC Production – Datacenter #1 DR– Datacenter #2 SILVER Confidential – Oracle Internal/Restricted/Highly Restricted

Silver – High Availability with Fast Failover Events Downtime Data Loss Potential Database instance failure Seconds if RAC Zero Recoverable server failure Data corruptions, database unable to restart, site failure Hours to days Since last backup, or Near-zero with ZDLRA redo transport Online file move, reorganization/redefinition, and patching Hardware or O.S. maintenance and database patches that can’t be done online but qualified for RAC rolling install Near Zero Database upgrades: patch sets and full database releases Minutes to hours Platform migrations Hours to a day App upgrades that modify back-end database objects Unplanned Outages Note: recoverable server failure is minutes if RAC One Node Planned Maintenance Confidential – Oracle Internal/Restricted/Highly Restricted

Cloud Silver MAA Next Steps
Reduce brownout for all infrastructure software updates Reduce brownout for Dom0, DomU and GI/DB software updates More Prechecks and Exadata optimizations coming Customer Action: Service Drain and Movement Customer Action: Tune Settings (fast_start_mttr_target, css miscount, etc) Incorporating Exadata GI Upgrade Best Practices Confidential – Oracle Internal/Restricted/Highly Restricted

Reducing blackouts and brownouts
Reduce 1 minute blackout to < 1 second for DomU updates varying TPS rates in slide 16 are likely due to a buffer cache issue we bugged as part of the recent HA quarterly (some PDBs getting more buffer cache than others). Would have to see the backing data to be sure but this is the classic symptom assuming the flash cache wasn't oversubscribed. You'll have this bug in the upcoming quarterly readout. Confidential – Oracle Internal/Restricted/Highly Restricted

Silver Option 2: High Availability with Fast Failover (OCI)
Confidential – Oracle Internal/Restricted/Highly Restricted

OCI DG Test Results RPO=0 with SYNC with minimal impact. Fast failover with potential < 30 seconds Confidential – Oracle Internal/Restricted/Highly Restricted

OCI – What is Available Bare Metal DB Systems
Virtual Machine DB Systems Virtual Machine RAC DB Systems Exadata DB Systems Data Guard Creation Included Manual Data Guard Role Transitions Data Guard Monitoring Data Guard Across Regions Data Guard Fast Start Failover Confidential – Oracle Internal/Restricted/Highly Restricted

OCI Deployment Model OCI deployment specifics
Provides HA, data protection and fast failover for database across ADs DG synchronous mode possible due to < 1ms between Ads VCN Peering across Regions available today PHX – IAD and LHR – FRA Cannot provision across Regions, so no DR via cloud tooling

Network Latency between Data Centers
PHX IAD LHR FRA 60 ms 118 ms 145 ms 79 ms 99 ms 17ms 99ms 17 ms VCN Peering exist between PHX and Ashburn. Confidential – Oracle Internal/Restricted/Highly Restricted

Gold: Comprehensive HA/DR
RTO of Seconds to Minutes, RPO of Zero or Near-Zero Real-time data protection, HA and DR using Active Data Guard Best corruption protection Zero or near-zero data loss Automatic database failover Offload read-only and backups Primary Standby Active Data Guard Queries Test/Dev Production – Datacenter #1 DR– Datacenter #2 (across AD or across Region) GOLD Reporting Confidential – Oracle Internal/Restricted/Highly Restricted

Oracle Data Protection
Gold – Comprehensive Data Protection Capability Physical Block Corruption Logical Block Corruption Dbverify, Analyze Physical block checks Logical checks for intra-block and inter-object consistency RMAN, ASM Intra-block logical checks Active Data Guard Continuous physical block checking at standby Strong isolation to prevent single point of failure Automatic repair of physical corruptions Automatic database failover Detect lost write corruption, auto shutdown and failover Intra-block logical checks at standby Database In-memory block and redo checksum In-memory intra-block checks ASM Automatic corruption detection and repair using extent pairs Exadata HARD checks on write, automatic disk scrub and repair HARD checks on write Manual Refer to MOS note for all the details. Enabling the runtime data corruption prevention, detection and repair provides the biggest benefit. The manual checks should be incorporated as part of your operational practices but they can be expensive and do not take inherent advantages of instance/quick detection and auto repair. Subsequent slides and examples focus on the run time checks that can be most beneficial and least intrusive to performance Benefits from database parameters such as enabling db_block_checksum, checking (great benefit but can be expensive), ASM with ASM level redundancy and Exadata storage benefits. Checking is great and it can prevent some of the toughest logical block corruption. From bottom to top corruption stack Reference: Best Practices for Corruption Detection, Prevention, and Automatic Repair (Doc ID ) Exadata: HARD checks for writes = stops corrupted blocks from being written. Includes SPFILE, controlfile, log files and requires DB_BLOCK_CHECKSUM=TYPICAL or higher. Works with ASM rebalances which other storage vendors may not have. No other storage vendor have this list of comprehensive HARD checks. Catches corruption that may have originated in memory or in transit (network). It does not protect you if the disk was corrupted post write. Exadata includes HARD check (for writes only) which does the followings: This checks for logical errors a) block checksum b) make sure the type of the block in the write buffer matches what it is intended for. For instance, we could have a write which is intended for being a REDO log write; but due to software bug, a temp block write is happening. HARD helps prevents this. c) make sure that the block number that is intended is same as block number being written to this. Essentially, the way HARD works is that a list of params is checked against the write buffer - in this case, say the write wants to update block X, HARD will verify that the write buffer actually contains block X as the block number in the block header. d) verify magic number on the block e) verify the block being written to has expected the block size f) verify head and tail of to block to make sure the write is not fractured. g) verify log seq#. 2) Exadata also has Disk Scrubbing for Exadata (do not mention until is released) and for database and up, which does the followings: a) Periodically scrub hard disk to detect bad sectors, and b) send result to scrub resilvering table, which which ask ASM to resilver the bad blocks The goal of this is to proactively check for disk failures especially for our aging disks. Oracle Exadata Storage Server Software automatically scrubs hard disks periodically when hard disks are idle. If bad sectors are detected on a hard disk, then Oracle Exadata Storage Server Software sends a request to Oracle ASM to repair the bad sectors by reading the data from another mirror copy. By default, the hard disk scrub runs every two weeks. It will scrub the entire disk but only when the disk less than 25% busy to ensure that it does not impact applications. Regarding why we still have disk corruptions with Exadata, the followings are some examples 1) There could be hardware failures on the disk. For instance, data could have been safely written to the disk during IO write time. But later on, there could be firmware bugs or bad mechanical parts of the disk that causes a stray write such that data is accidentally corrupted. 2) Similar to 1), we could have bad sectors.. 3) Software bugs such that we have stray writes (HARD/scrubbing may help with this). Only applicable with Exadata Storage and 11g and 12c+ databases. ASM: Recover from read errors on corrupted sectors on disks if a good partner exists Automatic bad block remapping If write fails, ASM attempts write to a new allocation unit on the same disk and mark the previous allocation unit unusable. If that fails, ASM will offline the disk. It addresses some logical block corruptions as well since Oracle is aware that the block is logically corrupt and ASM can attempt to read the mirror side for valid content. In 12c, Oracle ASM disk scrubbing is a new feature that checks logical data corruptions and repairs them automatically in normal and high redundancy disk groups. This feature is designed so that it does not have any impact on normal I/O in production systems. The scrubbing process repairs logical corruptions using the mirror disks. Disk scrubbing leverages the Oracle ASM rebalancing to minimize I/O overhead. ASM is available for 10g and up. More corruptions checks with 11g and up. DB Parameters: DB_BLOCK_CHECKSUM: This parameter determines whether DBWn and the direct loader will calculate a checksum (a number calculated from all the bytes stored in the block) and store it in the cache header of every data block and redo log when writing to disk. The checksum is used to validate that a block is not physically corrupt, detecting corruptions caused by underlying disks, storage systems, or I/O systems. If checksum validation fails when it is set to FULL, Oracle will attempt to recover the block by reading it from disk (or from another instance) and applying the redo needed to fix the block. Corruptions are recorded as ORA-600 or ORA in the database or ASM alert logs. Checksums do not ensure logical consistency of the block contents (see DB_BLOCK_CHECKING). Checksum checks happen in memory when a process reads the data or redo block into the SGA or PGA. Prior to writing an updated or new data or redo block, a new checksum is created. Potential clients of DB_BLOCK_CHECKSUM include: all foregrounds, DBWR, LGWR, LNS, RFS, ARCH, MRP, and recovery slaves. DB_BLOCK_CHECKING This parameter specifies whether or not Oracle performs logical intra-block checking for database blocks (memory semantic check). Block checking will check block contents, including header and user data, when changes are made to the block and prevents in-memory corruptions from being written to disk. It performs a logical validation of the integrity of a block by walking through the data on the block, making sure it is self consistent. When DB_BLOCK_CHECKING is set at MEDIUM or FULL, block corruptions that are detected in memory are automatically repaired by reading the good block from disk and applying required redo. If for any reason the corruption cannot be repaired an error will be reported and the data block write will be prevented. All corruptions are reported as ORA-600 or ORA errors in the database or ASM alert logs. DB_LOST_WRITE_PROTECT This parameter enables lost write detection. A data block lost write occurs when an I/O subsystem acknowledges the completion of the block write, while in fact the write did not occur in the persistent storage. Only applicable with 11.2 RMAN Backup and restore operation automatically does physical block checks and compare checksums. You can use the VALIDATE command to manually check for physical and logical corruptions in database files. This command performs the same types of checks as BACKUP VALIDATE, but VALIDATE can check a larger selection of objects. For example, you can validate individual blocks with the VALIDATE DATAFILE ... BLOCK command. Also BACKUP VALIDATE works for entire database or PDBs, data file, archivelogs. RESTORE VALIDATE commands exist as well. In a logical corruption, the contents of the block are logically inconsistent. Examples of logical corruption include corruption of a row piece or index entry. If RMAN detects logical corruption, then it logs the block in the alert log and server session trace file. By default, RMAN does not check for logical corruption. If you specify CHECK LOGICAL on the BACKUP command, however, then RMAN tests data and index blocks for logical corruption, such as corruption of a row piece or index entry, and log them in the alert log located in the Automatic Diagnostic Repository (ADR). Positive is comprehensive database wide checks. Negative is that it is not real time. Validating Tables, Indexes, Clusters, and Materialized Views To verify the integrity of the structure of a table, index, cluster, or materialized view, use the ANALYZE statement with the VALIDATE STRUCTURE option. If the structure is valid, no error is returned. However, if the structure is corrupt, you receive an error message. For example, in rare cases such as hardware or other system failures, an index can become corrupted and not perform correctly. When validating the index, you can confirm that every entry in the index points to the correct row of the associated table. If the index is corrupt, you can drop and re-create it. If a table, index, or cluster is corrupt, you should drop it and re-create it. If a materialized view is corrupt, perform a complete refresh and ensure that you have remedied the problem. If the problem is not corrected, drop and re-create the materialized view. The following statement analyzes the emp table: ANALYZE TABLE emp VALIDATE STRUCTURE; You can validate an object and all dependent objects (for example, indexes) by including the CASCADE option. The following statement validates the emp table and all associated indexes: ANALYZE TABLE emp VALIDATE STRUCTURE CASCADE; By default the CASCADE option performs a complete validation. Because this operation can be resource intensive, you can perform a faster version of the validation by using the FAST clause. This version checks for the existence of corruptions using an optimized check algorithm, but does not report details about the corruption. If the FAST check finds a corruption, you can then use the CASCADE option without the FAST clause to locate it. The following statement performs a fast validation on the emp table and all associated indexes: ANALYZE TABLE emp VALIDATE STRUCTURE CASCADE FAST; You can specify that you want to perform structure validation online while DML is occurring against the object being validated. There can be a slight performance impact when validating with ongoing DML affecting the object, but this is offset by the flexibility of being able to perform ANALYZE online. The following statement validates the emp table and all associated indexes online: ANALYZE TABLE emp VALIDATE STRUCTURE CASCADE ONLINE; Runtime Confidential – Oracle Internal/Restricted/Highly Restricted

Gold – Comprehensive HA and Data Protection Events Downtime Data Loss Potential Database instance failure Seconds Zero Recoverable server failure Data corruptions, database unable to restart, site failure Zero to minutes Near-zero if ASYNC Zero if SYNC Online file move, reorganization/redefinition, and patching Hardware or operating system maintenance and database patches that cannot be done online but are qualified for RAC rolling install Database upgrades: patch sets, full database releases Platform migrations Application upgrades that modify database objects Hours to days Unplanned Outages Planned Maintenance Confidential – Oracle Internal/Restricted/Highly Restricted

Cloud Gold MAA Next Steps
Continue reduce brownout for Data Guard Role Transitions Incorporate MAA Data Guard Fast Start Failover best practices DG Broker and DG settings Observer HA Setup (future) Far Sync Integration (future) Automate DG migration solutions (future) Incorporate Data Guard DBMS_Rolling solution for database upgrades How to manage unsupported data types? How to manage batch applications? How to minimize downtime? What happens if somethings goes wrong? Confidential – Oracle Internal/Restricted/Highly Restricted

Tuning DG Failover in the Cloud especially for ATP
Tuning and fixes to reduce downtime especially with closing and opening PDBs. Timings are reduced already for future DB releases Confidential – Oracle Internal/Restricted/Highly Restricted

Ongoing work Goal is incorporate MAA configuration Best Practices Check Refer to HA Best Practices guide for essentially the checklist Refer to MAA DG Best Practices papers for our overall checklist Refer to Exadata best practices (exachk - Oracle Exadata Best Practices ) Refer to , and software recommendations Confidential – Oracle Internal/Restricted/Highly Restricted

Database checklist OCI: missing clusterware managed services, large pages Confidential – Oracle Internal/Restricted/Highly Restricted

Exadata uses exachk Confidential – Oracle Internal/Restricted/Highly Restricted

Cloud MAA configuration next steps
Exachk during life cycle GOAL: Deployment with no gaps Cloud APIs to update latest exachk/orachk Run exachk automatically as part of pre and post software updates Auto-repair when possible (ATP) Cloud Software planning MOS – Exadata Critical Issues MOS – Exadata Cloud Service Software Versions MOS – Exadata Software Versions Exachk Software Planning Module Confidential – Oracle Internal/Restricted/Highly Restricted

Topics Covered 1 MAA Evolution: On-Premise to Cloud Cloud MAA Goals Cloud MAA Architectures (what’s doable today) Cloud MAA Configuration Best Practices (what’s there today) Cloud Migration and MAA Life Cycle Operations Cloud MAA with ATP 2 3 4 5 6 Confidential – Oracle Internal/Restricted/Highly Restricted

MAA Migration Solutions
1) Data Transfer Service, Backup/Restore, Data Pump, RMAN, PDB Ops (Downtime can be hours to days) 2) Simple Data Guard Solution (Downtime < 5 minutes) Migration with simple, straight forward configurations Prerequisites: Compatible Platform, Standby-First Compatible Oracle Home Software Versions Optional: TDE encryption on target 3) Advanced Data Guard Solution that requires Database Upgrade If current application and database can use transient logical for no downtime upgrade, then use Method 1; otherwise Method 2 Method 1: Transient Logical Solution (Downtime range from seconds to less than 30 minutes) Method 2: Data Guard Switchover and Upgrade Solution (Downtime < 2 hours) Prerequisite: Compatible Platform Optional: Plug into Cloud 12c+ CDB 4) GoldenGate Hub Solution for ATP and cloud (future) (Downtime in seconds) Overview of Data Transfer Service Oracle offers offline data transfer solutions that let you migrate data to Oracle Cloud Infrastructure. Moving data over the public internet is not always feasible due to high network costs, unreliable network connectivity, long transfer times, and security concerns. Our transfer solutions address these pain points, are easy to use, and provide significantly faster data upload compared to over-the-wire data transfer. Data Transfer Disk You send your data as files on encrypted commodity hard disk drives to an Oracle transfer site. Operators at the Oracle transfer site upload the files into your designated Object Storage bucket in your tenancy. This transfer solution requires you to source and purchase the disks used to transfer data to Oracle Cloud Infrastructure. The disks are shipped back to you after the data is successfully uploaded. See Data Transfer Disk for details. Data Transfer Appliance You send your data as files on secure, high-capacity, Oracle-supplied storage appliances to an Oracle transfer site. Operators at the Oracle transfer site upload the data into your designated Object Storage bucket in your tenancy. This solution supports data transfer when you are migrating a large volume of data and when using disks is not a practical alternative. You do not need to write any code or purchase any hardware—Oracle supplies the transfer appliance and all of the software required to manage the transfer. Confidential – Oracle Internal/Restricted/Highly Restricted

Simple Migration to Cloud (MOS 2386116.1)

Advanced Migration with Data Guard (MOS 2326901.1)

GoldenGate Hub for ATP (MAA POC phase)

MAA Lifecycle Automation in the Cloud
Database Lifecycle Backup & Recovery Snapshots & Cloning Patching Data Guard Operations Easy Lifecycle Management for Cloud Databases Migration Backup & Recovery Full or Point-in-Time Recovery Instantiate new instances from backups Data Guard Operations Failover, Switchover, Reinstate operations Database and Grid Patching Push button deployment Via UI or REST APIs Confidential – Oracle Internal/Restricted/Highly Restricted

MAA Life Cycle Operational matrix
DBCS OCI-C ExaCS OCI-C ExaCS OCI DBaaS BM OCI DBaaS VM OCI DBCS OCC ExaCC Backup/Restore APIs/Console APIs Data Guard -HA Some APIs* manual Data Guard - DR SW Updates (DB) SW Upgrade (GI | DB) Available available Migration PDB LCM Some APIs n/a * DG creation, DG Role Transition, Basic Monitoring. Future: FSFO with optional Far Sync, Multiple Standby

ExaOCI Backup and Restore to / from object store (WIP ATP too)
Compression RMAN Channels Section Size (GB) None (24 * 2) = 48 16 3TB/hr 1.5% - 2.5% (16 * 2) = 32 13TB/hr Medium 8TB/hr 12% - 20% (16 * 2) = 16 19TB/hr Restore (a Applies L0 +L1s Worse case DR (24 * 2) = 48 16 < 1TB/hr 24% - 27% ~2TB/hr 28% - 30% Effective Bkup Rate This is the detailed slide. Network Throughput is the Effective Throughput – Need to explain it. For the restore operations, it is important to mention that we did not do a restore validate. We wiped all the datafiles and restored the level 0 plus all the incrementals to simulate an ugly disaster whereby the database is lost just before its weekly level 0 - this is a dramatic case and we will be refining more our testing for benchmarking numbers. For the CPU utilization, the backups were taken without workload. There will be other scenarios with workload. Oracle Confidential – Internal

Appendix Oracle Confidential – Internal

Grid Infrastructure and Database Software
Patch: Rolling, In-place RU apply using Opatch – initiated UI or CLI Upgrade: 12.2 using PILOT (MOS ) – follows Exadata best practices Database Patch: Rolling, Out-of-place using pre-built images – initiated UI or CLI MAA project – Evaluate patch timing using exadbcpatchmulti orchestration Initial total patch time: 93min Expected phase1 improvement: 69min MAA Next Steps Evaluate Exadata best practices for GI 18c PILOT-based upgrade and RHP integration (with ATP focus) Further reduce patch timing and application brownout Integrate exachk into pre/post patching flow GI upgrade process for ExaCS using PILOT tool follows Exadata 12.2 GI upgrade practices developed by MAA team (MOS ) MAA timing study of exadbcpatchmulti patching orchestration – 2 node, 1 database, minimal workload, no draining 93 mins total patch time Improvements in multiple phases – phase1 expects ~24min reduction, not yet validated, additional improvement in phase2 Confidential – Oracle Internal/Restricted/Highly Restricted

Exadata Software Infrastructure Updates (cell, ibswitch, dom0)
Always rolling – patchmgr initiated by ECRA CLI, run by Oracle Cloud Ops Customer domU Updates Process same as on-prem - Use latest patchmgr (i.e. dbserver.patch.zip) Simplified cloud-specific documentation (MOS ) MAA Next Steps (with ATP focus too) Evaluate domU update integration with Cloud Tooling Integrate RHPHelper to gracefully drain connections during dom0/domU updates Integrate exachk into pre/post patching flow Reduce application brownout for all software updates For domU updates the simplified cloud-specific steps are referenced in , written and tested by MAA team. Simplified from “standard” Exadata because of the reduced number of configurations that have to be considered. Cloud-specific because it is written assuming default OS user configuration (e.g. root has no key pairs generated) Confidential – Oracle Internal/Restricted/Highly Restricted

MAA Evolution: Autonomous Database
Choosing the SLA policy Application performance Customer Oracle Architecture Database Management (Tooling) Configuration, Tuning Lifecycle Operations (Tooling) Application Performance On-Premises On-Premises Exadata Database / Exadata Cloud Autonomous Database Infrastructure Management Architecture Database Management Configuration, Tuning Lifecycle operations Application Performance Infrastructure Management Architecture Configuration, Tuning Database Management Lifecycle Operations Application Performance Oracle owns and manages the best integrated MAA DB platform Cloud automation for provisioning and life cycle operations Oracle owns and manages Infrastructure Policy driven deployments MAA Integrated cloud Fully automated Self-Driving, Self-Securing, Self-Repairing Database Blueprints Exadata is the best integrated MAA DB platform Blueprints Feedback to products & features Confidential – Oracle Internal/Restricted/Highly Restricted

High Availability Policy
RAC Database in a single Availability Domain, with redundant storage and networking Nightly Backup that is replicated across ADs Protects from the most common sources of downtime such as hardware failures, software crashes, and quarterly software updates Service Uptime SLA per Month: 99.95% - less than 22 minutes of downtime* Suitable for test, development and non-mission critical production databases DB Backup Service Region #1 Database Backups Primary Database * SLA excludes AD or Regional Failures, data corruptions and certain planned maintenance tasks like major upgrades Confidential – Oracle Internal/Restricted/Highly Restricted

Extreme Availability Policy
RAC Database, Redundant Networking and Storage, Active Data Guard, and Backup Protection from hardware failures, crashes, corruptions, patches, upgrades, disasters Service Uptime SLA per Month: NRX% (NRX = No Ridiculous Exclusions) 99.995% Uptime = at most 2m 12s of downtime per month Goal is for application impact from any one event to be well under 30 seconds Suitable for Mission Critical production databases Can the policy be set at the CBD or pod level??? Primary Database Region #1, AD #1 Region #1, AD #2 Backup Standby Database Active Data Guard Confidential – Oracle Internal/Restricted/Highly Restricted

ATP Patching Quarterly Patching of all components (on-demand for critical security issue) Firmware, OS, Hypervisor, Clusterware, Database Usage of Gold Image Patches are applied in a rolling fashion with RAC cluster and Exadata Storage Database is continuously available to application Applications that implement Oracle Application Continuity best practices will run without interruption MAA checklists and evaluations prechecks, reliability, fallback, cloud-scale, low brownout to meet SLAs and smart repair Confidential – Oracle Internal/Restricted/Highly Restricted

ATP Major Release Upgrades
Upgrade to new Major Release by: Temporarily converting physical standby to logical standby Upgrading the logical standby Switching service over to upgraded standby Converting original primary back to physical standby Downtime for major release upgrade reduced to near zero (just app switch) Safer than in-place major release upgrade of production database Based on DBMS_ROLLING procedures If features that are not supported by logical standby are used by application, they will need to be disabled for length of upgrade, or upgrade must be done offline MAA checklists and evaluations Can we meet service downtime SLAs? Why not use PDB or GG solutions? Confidential – Oracle Internal/Restricted/Highly Restricted

Uptime Availability Enablers
Cloud Automation Real Application Clusters – provides transparent ,near-zero downtime patching, and near-zero downtime for server failover Active Data Guard - provides DB upgrade in seconds, corruption prevention and auto repair, disaster failover, reporting offload Redefinition – Online Redefinition provides online changes to table and index definitions Other building blocks include ASM, RMAN, Flashback, Multitenant etc. Exadata provides hardware fault-tolerance, fastest detection of faults and sick components, lowest brownout Maximum Availability Architecture integration provides proven Enterprise Mission Critical Architecture, configuration best practices and life cycle operations Cloud Automation – complete automation and testing of full stack from database to disks ensures High Availability and prevents configuration and operator issues Active Data Guard RAC Flashback Redef Multitenant ASM RMAN Confidential – Oracle Internal/Restricted/Highly Restricted

Automated Lifecycle Management
Backup & Repair Data Guard HA/DR Operations Software Updates (Patching) Software Upgrades Monitoring & Notification Elastic Capacity Management Migration End-to-end Security Migration Notify Monitoring Security Lifecycle Repair and Recovery Updates Optimize Manage Confidential – Oracle Internal/Restricted/Highly Restricted

External Reference Maximum Availability Architecture
MAA Home: On-Premises MAA: maa-best-practices html Exadata MAA: practices html RA MAA: appliance html Cloud MAA: html Confidential – Oracle Internal/Restricted/Highly Restricted

Oracle Maximum Availability Architecture

Similar presentations

Presentation on theme: "Oracle Maximum Availability Architecture"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Oracle Maximum Availability Architecture

Similar presentations

Presentation on theme: "Oracle Maximum Availability Architecture"— Presentation transcript:

Similar presentations

About project

Feedback