Presentation is loading. Please wait.

Presentation is loading. Please wait.

Oracle database Health-Check Common Issues

Similar presentations


Presentation on theme: "Oracle database Health-Check Common Issues"— Presentation transcript:

1 Oracle database Health-Check Common Issues
April 2016

2 rodrigo.righetti@accenture.com @rrighetti righettidba.com Who am I?
IT guy since 1992 – 8bit MSX my first PC  Oracle DBA since 1999 (Brazil -> UK -> USA) Performance and SQL Tuning Exadata/SuperCluster AEG Latin America – Practice Director @rrighetti righettidba.com Copyright © 2016 Accenture All rights reserved.

3 AEG Database Health-Check Common Issues
Agenda Who we are? AEG Database Health-Check Common Issues Copyright © 2016 Accenture All rights reserved.

4 Accenture Enkitec Group – Oracle Capabilities
Global systems integrator focused on the Oracle platform Consultants average 15+ years of Oracle experience Worldwide leader in Exadata implementations 21 Oracle ACE members Innovation Center Elite Oracle Specializations Oracle Exadata Oracle Database Oracle GoldenGate Oracle Data Integrator Oracle Data Warehouse Oracle Real Application Cluster Oracle Performance Tuning Oracle Database Security Expertise Success Our consultants have been published in multiple subject areas and additional online resources that demonstrate Accenture’s experience and expertise with the OES platform Thought Leadership Copyright © 2016 Accenture All rights reserved.

5 Oracle database Health-Check
Provided by Accenture Enkitec Group (AEG) WHAT? Multidimensional review of an Oracle database WHY? Usually as a preparation for an upgrade or re-platform HOW? Identifying concerns and opportunities DELIVERABLE? Report with findings and recommendations Copyright © 2016 Accenture All rights reserved.

6 Multidimensional Analysis
System Architecture Custom Focus Areas Database Configuration Resource Management Database Performance System Capacity Planning Application Performance SQL Tuning Copyright © 2016 Accenture All rights reserved.

7 Diagnostics Tools edb360 orachk awr ash addm alert log os stats traces
sqld360 Copyright © 2016 Accenture All rights reserved.

8 Health-Check Sample Report
Copyright © 2016 Accenture All rights reserved.

9 Health-Check (HC) Process
DBA collects initial diagnostics (i.e. edb360 + orachk + alert logs + …) Accenture Enkitec Group (AEG) performs expert analysis and prepares a report with Findings & Recommendations. Delivers verbal presentation and detailed report to client. Diagnostics collection: edb360 + orachk + AEG Expert Analysis AEG Report preparation Present Findings & Recommendations Timeline Collection (1-3 days) Health-Check expert analysis and report preparation (2-3 weeks) Presentation (1-2 hours) Copyright © 2016 Accenture All rights reserved.

10 Health-Check Common Issues
What we consider common issues? Issues that we found to be common on more than 50% of the databases. How critical are they? Most of the time they are perceived as low impact by DBAs, one reason why they are not taken care of, but in reality many of them cause significant impact on overall system performance, security and stability. Copyright © 2016 Accenture All rights reserved.

11 Database Over Indexing
On almost 100% of our engagements we observe databases over indexed. It is not uncommon to see 30% or more of unused indexes. Edb360 – Indexes not recently used Indexes are costly to maintain during DMLs operations. More Redo is generated More space is required to store the indexes. More I/O and memory access is performed to maintain the indexes. Simple test case to measure the impact, benchmark on your own system !!! OLTP – INSERT .. VALUES – row by row Bulk – INSERT SELECT Seconds Percentage Increase GB processed OLTP Bulk %OLTP %Bulk LIO_OLTP redosize_OLTP LIO_Bulk redosize_Bulk NoIndex 21.31 7.51 8.63 0.37 0.99 0.11 1 Index 37.36 30.23 175% 403% 31.86 0.66 3.62 0.23 5 Indexes 142.13 108.41 667% 1444% 128.46 2.09 16.42 1.07 LIO – Logical I/O Copyright © 2016 Accenture All rights reserved.

12 Database Over Indexing
So what do I do? Can I drop them all? OFF COURSE NOT, NO, BIG NO ….. You should Monitor for a period of time where the whole application stack has being executed then you can mark the indexes invisible, which means the Optimizer will not be able to use them, but it will still be maintained, after you got confirmation the index is not needed, NO ONE HAS COMPLAINED, you can drop them. To Monitor an Index Usage: ALTER INDEX INDEX_NAME MONITORING USAGE; Index Rebuild will mark the index as USED even on 12c. Before 12c you can monitor using V$OBJECT_USAGE view, but it only shows data for the schema owner, create a custom version: except-ice-cream/ 12c use DBA_OBJECT_USAGE To mark an Index Invisible: ALTER INDEX INDEX_NAME INVISIBLE; Copyright © 2016 Accenture All rights reserved.

13 Sequences Prone to Contention
Sequences with Small Cache (0-100) and with Order. High contention to update SYS.SEQ$ dictionary table. Problem exacerbated on RAC environments High GC contentions Overhead Impact test caused by “update seq$” begin for i in loop insert into test_seq_cache values (perf_issue.nextval); end loop; end; TotalTime Updates Seq$ LIO_seq$ LIO_Total 1000 cache 48 secs 1002 4076 Nocache 258 secs 5.3X Slower 1000X Updates 1000X LIO On RAC it can get much worse !!! Copyright © 2016 Accenture All rights reserved.

14 RecycleBin Recycle bin with thousands of objects
In addition to consuming a lot of storage capacity, having too many objects in the recycle bin may lead to slow performance DBA_FREE_SPACE – MOS This may also increases backup time and space requirements. Periodically purge the recyclebin Schema owner -> purge recyclebin; Entire DB as SYSDBA -> purge dba_recyclebin; Copyright © 2016 Accenture All rights reserved.

15 Long Running RMAN Backups
Direct backup to “tape” No Use of Incremental backups No use of Block Change Tracking feature This scenario not only generates long backup executions, but the recovery as well. TO IMPROVE!! Backup to Disk (FRA) first and then to tape Use of Incremental backups during the week and full during the weekend Enable Block Change Tracking By the way, have you ever tested your recover strategies? Copyright © 2016 Accenture All rights reserved.

16 Corruption Prevention
DB_BLOCK_CHECKSUM – early detection of corruption MOST USEFUL to detect fault disk operations Default TYPICAL – 1%-2% overhead FULL – 4%-5% overhead Update/delete statements re-computes checksum Catches in-memory corruptions and stops them from making it to the disk. Oracle gives every log entry a checksum before writing it to the current redo log.  DB_BLOCK_CHECKING – _db_always_check_system_ts=TRUE Most useful to detect memory corruption and integrity Setting this parameter to LOW will do basic block header checks after block contents change in memory. Setting it to MEDIUM will also perform semantic block checking for all non-index-organized table blocks. Setting it to FULL will also do semantic checks for index. This may cause up to 10% of higher overhead, depending on the workload. Risk Perception Impact – Faster or Safer ?????? Copyright © 2016 Accenture All rights reserved.

17 FORCE LOGGING Many database out there run NOLOGGING operations without considering the recovery impact and/or Dataguard corruption. Set the database to run with FORCE LOGGING to avoid potential irrecoverable situations. This recommendation should be considered if the restore process that is in place is not considering the reload for the Direct Path Load (DPL) NOLOGGING operations. Making this change will increase redo log generation and DPL operations execution time. It is a trade-off between performance SLA and data availability SLA. Benchmark DPL operations with NOLOGGING vs. LOGGING. In many applications the difference is so marginal that setting the Database to Force Logging would not cause performance degradation while providing higher recoverability. Copyright © 2016 Accenture All rights reserved.

18 AUD$ and FGA_LOG$ on SYSTEM TBS
The audit tables SYS.AUD$ and SYS.FGA_LOG$ are stored in the SYSTEM tablespace by default. System tablespace is Manual Segment Space Management (MSSM) tablespace. Those tables can be heavily inserted, which can cause freelist contention. Oracle recommends moving them out into an Automatic Segment Space Management (ASSM) tablespace.Reference: MOS ON 12c consider the use of Unified Audit Trail BEGIN DBMS_AUDIT_MGMT.set_audit_trail_location(audit_trail_type => DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD,--this moves table AUD$ audit_trail_location_value => 'SYSAUX'); END; / DBMS_AUDIT_MGMT.set_audit_trail_location(audit_trail_type => DBMS_AUDIT_MGMT.AUDIT_TRAIL_FGA_STD,--this moves table FGA_LOG$ Copyright © 2016 Accenture All rights reserved.

19 ROW-by-ROW = SLOW-by-SLOW
Row-by_row method (suboptimal) For cursor in (select id from asset where promotion_id is null) Loop Update asset set promotion_id=999 Where id= cursor.id; End Loop; Set-Based method (optimal) Update asset a set a.promotion_id = 999 Where exists (select 1 from asset where id=a.id and promotion_id is null); Copyright © 2016 Accenture All rights reserved.

20 AWR Retention By default snapshots of the relevant data are taken every hour and retained for 7 days. Usually not enough to baseline the entire application cycle – 7 days Usually not granular enough for troubleshooting and spikes visualization – 1 hour Consider at least 30 days and 30 minutes interval In some cases for trend analyses and growth projection it is useful to keep longer periods, be mindful of the space used on SYSAUX. Smaller interval can also help on troubleshooting, but after analyses is complete, return to 30 minutes. BEGIN DBMS_WORKLOAD_REPOSITORY.modify_snapshot_settings( retention => 43200, interval => 30); END; Copyright © 2016 Accenture All rights reserved.

21 PROCESSES and SESSIONS / Connection Pool
Dynamic connection pools Connection STORMs Wasted resources with high number of Inactive Sessions Latch contention and complete Hang situations Over CPU allocation – high queue/scheduling/context switches To FIX: Use Static and “small” connection pool, you want to have control/stability Reduce the potential for connection storms Reduce waste with lower number of inactive sessions Better CPU utilization and scalability Please watch YouTube: Real-World Performance Large Dynamic Connection Pools - Part 1 Real-World Performance Large Dynamic Connection Pools - Part 2 Copyright © 2016 Accenture All rights reserved.

22 Parallel Utilization PARALLEL_MAX_SERVERS too high, should stay >= 2X the number of CPU Threads PARALLEL_MIN_SERVER too low = 1, should be =parallel_max_servers PARALLEL_THREADS_PER_CPU too high when = 2 or more, ideally 1 DEGREE set on tables and indexes higher than 1 This technique is not recommended since it is a loose control that would allow multiple concurrent users flood the system with parallel plans. No resource Manager to control PX Slave utilization. Copyright © 2016 Accenture All rights reserved.

23 Outdate, Stale Stats Default Auto Gather stats usually enough for 80-90% of the cases 10-20% of tables/index may require customized configurations Temporary tables with stats, many times with 0 as num_rows Lead to miscalculated cardinality and suboptimal execution plan, possible ”Cartesian joins” Use Dynamic sampling(no stats), gather with a representative sample, gather at execution time. E.g: use of INCREMENTAL for large Partitioned tables Avoid ”out-of-range” effect: Exacerbated by default 10% modifications change rule on large tables Copyright © 2016 Accenture All rights reserved.

24 Database Version/PatchSet
Bugs Security Patches Release Schedule of Current Database Releases (Doc ID ) Copyright © 2016 Accenture All rights reserved.

25 Wasted Space / Large Non-Partitioned tables
Wasted space in the order of TB Large non-partitioned tables 20GB+, a few TB cases Indexes larger than the table itself This is a symptom of monotonically growing values for the column on which the index is built, combined with a data purge from the corresponding table. Basically, the space made available (by deleting old values) is reused in the table but is not reused in the index. Values may end up sparsely deleted on index blocks, and thus the blocks are never released back to free space simply because there are still some values in them. Lack of ILM (Information Lifecycle Management) process Consider partitioning by date those tables bigger than 10GB, this in order to benefit from partition pruning. Consider coalescing, shrinking or re-organizing objects that have excessive space wasted. Implement ILM with partitioning and compression. Copyright © 2016 Accenture All rights reserved.

26 Security DBA and other elevated roles unnecessarily granted.
Public database links. No application of “least necessary privilege” principal. Relaxed or complete absence of security policies and audit. No ”Segregation of Duties”. And many more …. Review users which have sensitive role granted and revoke the privileges from the users that don’t need it. Make sure you perform a thorough impact evaluation before revoking the privilege. Practice the principle of least privilege granting necessary privileges only. Do not provide database users more privileges than necessary. Enable only those privileges actually required to perform necessary jobs efficiently. Replace all public database links with private ones, permanently remove the unused ones and consider creating them with users that have the correct/necessary permissions based on the least privilege concept. 12c – DBMS_PRIVILEGE_CAPTUIRE Implement SOD – Database Vault Copyright © 2016 Accenture All rights reserved.

27 Invalid Objects Execution of statements referencing those objects would report errors like ORA-4061, ORA or ORA-4068. Recompile/validate the invalid objects. Drop the ones that are no longer needed. Copyright © 2016 Accenture All rights reserved.

28 Segments using SYSTEM Tablespace
SYSTEM tablespace should never be used by non-sys/system objects. Move objects that are not owned by SYS or SYSTEM outside of the SYSTEM tablespace. Copyright © 2016 Accenture All rights reserved.

29 SGA Pool Resizes SGA is dynamically being resized.
SGA_MAX_SIZE and SGA_TARGET are set to XXGB bytes, but each pool section has no minimum set or it is too small, which allows the SGA components to resize freely. The risk of not setting minimum pool values is that dynamic relocation may flush significant amount of memory causing system performance instability. Copyright © 2016 Accenture All rights reserved.

30 Controlfile multiplexed/Redo Log mirrors
Control_files multiplexed to the same diskgroup/filesystem. Hard to recover in case of a disaster Redo Logs groups created with a single member. This is a critical issue, if redo logs have a single member, any corruption or loss of this file will result in a database crash and data loss. Copyright © 2016 Accenture All rights reserved.

31 Multi-Block Reads Oracle recommends not setting this parameter so that an optimal value can be chosen automatically. With a non-default setting, it not only affects CBO costing calculation, but also limits how many blocks a Full Scan operation can perform. If the default configuration is used, the CBO uses 8 for costing, and the maximum OS limit on the number of blocks a multi-block read can access at once. Reset the setting of DB_FILE_MULTIBLOCK_READ_COUNT from all of your instance parameter files. Copyright © 2016 Accenture All rights reserved.

32 Maintenance Window – Resource Manager Switch
Unexpected performance degradation due to CPU throttling Use Resource Manager and Windows but create a plan that is design for your needs. Many shops have their own Resource Manager plan but forget to change the Window plan. DBMS_SCHEDULER.set_attribute( name => 'SYS.MONDAY_WINDOW', attribute => 'RESOURCE_PLAN', value => 'MY_NEW_PLAN'); Copyright © 2016 Accenture All rights reserved.

33 EXADATA- Why Smart Scans may not happen?
Unsatisfied Requisites Full Scan (Table or Index) Direct Path Read Which requires in turn a Full Scan Segment mostly in memory Implicit Force Full DB Caching Big Table Caching In Memory PX Simply not available Clustered Tables Index Organized Tables (IOT) Tables with ROWDEPENDENCIES Function is not offloadable Most of the blocks in the buffer cache Migrated/Chained Rows Read consistency – uncommitted changes MV Full Refresh may not use SmartScan if not using PX(recursive query “bug”). Copyright © 2016 Accenture All rights reserved.

34 Tools that I like Is Smart Scan used?
fsx.sql or sqlmon.sql – Kerry Osborne and Carlos Sierra Scripts IO Saved? fsx.sql or sqlmon.sql Why Smart Scan efficiency is lower than expected? mystat.sql or mystats.sql EDB SQLD SNAPPER – TANEL PODER Copyright © 2016 Accenture All rights reserved.

35 Quick demo edb360/sqld360 Copyright © 2016 Accenture All rights reserved.


Download ppt "Oracle database Health-Check Common Issues"

Similar presentations


Ads by Google