Tuning Oracle for Blackboard

Tuning Oracle for Blackboard
How to ensure that your Oracle DB for Blackboard performs at the level you expect Audience-check: who’s already on 10g? Mention: 10gR2 not yet supported Currently recommended versions: , w. lxcscom patch Volker Kleinschmidt, Blackboard Client Support © Blackboard, Inc. All rights reserved.

Session Overview The Cost-Based Optimizer Initialization Parameters
The Importance of DB-Statistics Troubleshooting Bad Performance The Performance Report Further Tuning Tips

About Forward-Looking Statements
We may make statements regarding our product development and service offering initiatives, including the content of future product upgrades, updates or functionality in development. While such statements represent our current intentions, they may be modified, delayed or abandoned without prior notice and there is no assurance that such offering, upgrades, updates or functionality will become available unless and until they have been made generally available to our customers.

The Cost-Based Optimizer
Oracle 8i: recommended Oracle 9i: indispensable Oracle 10g: only option Rule-based Optimizer is dead BB code has always relied on CBO, not RBO Must ensure that CBO works at optimal level! © Blackboard, Inc. All rights reserved.

What does the CBO do? Parses query and generates lots of different execution plans for it Estimates costs for each plan and chooses cheapest option so far Bases its calculations on what it knows about your system: statistics about your database, and init-parameters Without current DB statistics, default assumptions are made about how big each table is etc., leads to *wrong* decisions much of the time.

When the CBO doesn’t work…
The CBO can only make the right decisions if it has enough information about your system and your data missing/outdated statistics are deadly So are bad init parameters Shipping defaults are miserable! Oracle default configuration assumes that the DBA will adjust parameters to match his setup and hardware. Without an experienced DBA, this assumption is clearly wrong, and can lead to terrible performance.

Initialization Parameters
Memory management (Mis-)guiding the Optimizer More than meets the eye What init parameters are most important for performance, where are they stored and how are they changed? © Blackboard, Inc. All rights reserved.

Initialization Parameter Files
stored in $ORACLE_HOME/dbs/ Oracle8i: init$ORACLE_SID.ora (pfile=parameter file) editable and human-readable text Oracle9i/10g: spfile$ORACLE_SID.ora (spfile=server parameter file) binary, non-editable advantage of spfile: can dynamically alter system and save changes to spfile to preserve them through a DB restart With 8i, many parameters not alterable on running system. On 9i, much more flexibility, as long as one is running with an spfile, since changes are preserved through restarts when made in the spfile.

Changing an init parameter (9i)
--to test a parameter: ALTER SESSION SET param=value; --to make semi-persistent change on running system: ALTER SYSTEM SET param=value SCOPE=MEMORY; --to make a change in spfile only, i.e. only becomes effective after DB restart (test first!): ALTER SYSTEM SET param=value SCOPE=SPFILE; --to do both, after the value has been tested (!): ALTER SYSTEM SET param=value SCOPE=BOTH;

pfile vs. spfile Need to keep editable version (pfile) and backup copy to protect against mistakes that make system nonfunctional (a named spfile) oracle# sqlplus ‘/ AS SYSDBA’ CREATE pfile=initSID.ora FROM spfile; Similarly: CREATE spfile=… FROM pfile=… To use a named backup spfile if the main spfile is corrupt, we can start the DB with a pfile containing only one parameter: spfile=… Must first remove the bad default spfile though

Memory Management Sizing the SGA & PGA
Let Oracle do it (9i/10g): sga_max_size=(what it mustn’t get above) 10g: sga_target=(what you’d like it to be) pga_aggregate_target=(desired total size) workarea_size_policy=AUTO (!!!) Should use only dedicated server! Do NOT set legacy parameters like hash_area_size, sort_area_size (in 9i, they’re only for shared server) SGA = system global area = what the DB as a whole needs in terms of memory. Data buffers, SQL code buffers, etc. PGA = process global area = memory that each process needs/uses to do sorting, hashing, merging, etc. There’s also the UGA (User Global Area). With dedicated server, this is part of the PGA (since each user has his own process). With shared server, it’s in the SGA (since the shared servers all need to access this session data). This is where any variables you define would be held, plus administrative session overhead etc. Note that sga_max_size must be supported by the kernel settings for maximum shared memory segment – crucial on Solaris, still important for performance on Linux – otherwise you get multiple disjoint shared memory segments.

Dedicated server Dedicated Server = one server process oracle$ORACLE_SID per client connection (TNS). Proper way of operating with Blackboard, since pooling is already done on client side (Java connection pool; few fixed Perl DBI connections per modperl process). Note: setup guide states we only support dedicated server, no MTS/shared server! Dedicated server requires the $ORACLE_HOME/network/admin/listener.ora file on the DB server to be configured for this service, i.e. the listener needs to be told to listen for requests to this SID. Do not configure it to listen by SERVICE_NAME, since Blackboard connects by SID, not by SERVICE_NAME. Client can either be configured to talk to this SID via a tnsnames.ora file, or can use a direct TNS connect string, which is what the BB apps do (hence no tnsnames.ora needed for BB, though we do create one in /usr/local/blackboard/config/oracle/network/admin with a service alias named “bbadmin” – this is similar to the ODBC system data source set up on Windows, which also isn’t really used).

Shared server Shared Server (formerly known as MTS=Multi-threaded server) = small pool of server processes (ora_s###_$SID) held ready to serve incoming requests. Used mainly in situations were DB-server memory is scarce. Dispatchers (ora_d###_$SID) hook up client requests to servers. Dispatchers (ora_d###_$SID) auto-register with listener, using SERVICE_NAME, not SID. $ORACLE_HOME/bin/lsnrctl services Will tell you whether there’s a shared server and/or dedicated server listening (SID should be listed twice, once for dedicated server using TCP, once for bequeath protocol = local connection). SIDXDB = shared server. Not appropriate for Blackboard, since BB expects each pool connection to actually have a working connection at all times and not have to wait (if the java app has to wait for a pool connection, fine, but once it has one assigned, it shouldn’t have to wait any further).

Why not shared server? Shared Server connections demand that work be submitted in small chunks, i.e. application code needs to be specially prepared for that So set (max_)shared_servers=0 and max_dispatchers=0, and reset/remove any legacy mts_... Parameters. Shared Server connections demand that work be submitted in small chunks, i.e. application code needs to be specially prepared for that, otherwise a few clients can lock out the others with long-running requests hogging scarce resources. This does not happen with dedicated server. Since BB has not only quick brief OLTP requests, but also long-running tasks such as course copy, it is not suited for shared server.

Memory Management Ora10g: sga_target is all you need with Automatic Shared Memory Management (ASMM) if statistics_level=‘typical’ or ‘all’ Ora9i: sga_max_size must be ≥ db_cache_size + log_buffer + shared_pool_size + large_pool_size But: sga_max_size is only upper bound! No automatic redistribution in 9i! Still need to tune the various buffer sizes… Note: statistics_level controls what system statistics are gathered, has nothing to do with table or index stats. This also needs to be set at minimum to ‘typical’ for PGA_AGGREGATE_TARGET to work. ‘typical’ is the default. ‘basic’ means “off”. With statistics_level=‘basic’, the various advisory views are not created, so tuning the DB becomes much harder. Overhead due to this statistics collection is negligible. Ora10g: sga_target can be adjusted on the fly via ALTER SYSTEM, as long as it’s below sga_max_size.

Finding the right pool sizes
There is no golden rule Start with “reasonably low” values Use BB perf report to see what values to adjust (section with advice about current params and suggested changes) Make 10% adjustments, run it again Details later in this presentation There also are various advisory views for db_cache_size etc. – see Oracle 9i/10g Reference and Performance tuning manuals.

How big should the PGA be?
Each BB connection = 1 ora server process (using dedicated server!) 8i: PGA= #processes * size/process, for each proc add bitmap_merge_area_size + hash_area_size + sort_area_size 9i+: PGA_AGGREGATE_TARGET=200MB means each of 200 processes has 1MB in theory, but in reality as much as it needs, dynamically allocated from the 200MB Normally, most processes aren’t simultaneously sorting or hashing, so allocating much memory to each process is very wasteful, and usually there isn’t enough RAM to allocate to serve all needs. Using a memory pool (of “normal” max size PGA_AGRREGATE_TARGET) from which to allocate what’s needed is much more efficient. The max size of this pool doesn’t mean that further requests will fail (e.g. if a process creates a lot of temporary memory objects like a large array, more than the max-size might be allocated). However any further requests will result in sorts/hashes being done on disk, leading to very bad performance. This situation, when more memory than PGA_AGRREGATE_TARGET is requested by the server processes, is called over-allocation and should be avoided at all costs.

Monitoring PGA usage Initially set to 16% RAM (Oracle Corp.)
Use V$PGASTAT to monitor over allocation count and cache hit percentage Run normal (!) workload, then raise PGA_AGGREGATE_TARGET as suggested by V$PGA_TARGET_ADVICE to achieve no over allocations and high cache hit percentage, restart, check again The Oracle Corp. suggestion to allocate 16% of physical RAM to the PGA as starting value is to be taken with a grain of salt, especially if you have multiple services running on a big box. Let’s say “a few hundred megabytes”. The cache_hit_percentage should ideally be 100%, but you may not have the RAM to support that, plus other areas (SGA – buffer cache and shared pool) may be able to make better use of scarce RAM, but you should be close to 100%. Since the advisories reflect numbers since last DB restart, altering the parameters, restarting and then collecting new advisory data is a process that can take several days. You want to collect data during normal workload, i.e. NOT during low usage periods such as very early morning when a restart would be easily feasible.

(Mis-)Guiding the Optimizer
bad shipping defaults in Oracle9i: optimizer_index_caching=0 says: “you don’t normally have any index blocks cached in RAM” (percent-value) More realistic: (in SSO: 50-60) optimizer_index_cost_adj=100 says: “index-access is just as expensive as full table scans” More realistic: (i.e. cost is 1/fifth or so) Note: what’s realistic depends on SGA tuning Obviously, if you have a small box with little RAM and hence a small db_cache_size (data buffer cache), a lower value is realistic for optimizer_index_caching than on a large box with gobs of RAM and a large SGA. Nevertheless, most indexes practically used by the application can be expected to have been recently used and hence still be in the cache. The optimizer_index_cost_adj parameter actually compares the relative cost of single-block access (which is the technique used by index-based access) vs. multi-block access (full table scans read multiple blocks at once, which is more efficient use of the slow disks). Surely single-block access from disk is actually more expensive than multi-block access from disk, so setting this parameter low presumes that most single-block accesses will in fact be reading from memory, i.e. having this value very low only makes sense if the previous value is rather high. So, on memory-starved systems set this to 50/50, on high-memory systems to something like 80/25 or even 90/20. Note: what value is realistic here also depends on the value of db_file_multiblock_read_count (i.e. “how many blocks do we read at once during a multi-block access?”). Be extra careful when playing with that parameter, it’s a great way to cheat your optimizer into oblivion.

(Mis-)Guiding the Optimizer
These params must be edited in pfile/spfile, cannot ALTER SYSTEM, so need DB restart but can ALTER SESSION to test/try new values Want to know what the “true” values are, i.e. what Oracle observed physically? DBMS_STATS.GATHER_SYSTEM_STATS (must be run during TYPICAL workload) (overrides your init params; to clean up if problems result run DELETE_SYSTEM_STATS) If running GATHER_SYSTEM_STATS clears up a particular performance problem, you know almost for sure that bad init parameters prevented the CBO from making the right decisions before, so you know where to start. Note: this does not collect table/index stats, and missing table stats also lead to bad/wrong CBO decisions, so before messing with your init params make sure you have current table/index stats!

Guiding the Optimizer optimizer_mode=CHOOSE lets CBO decide (based on stats) whether to quickly return some rows or optimize for complete result set optimizer_features_enable=9.2.0 (minimum) can temporarily turn this down for troubleshooting bad CBO decisions cursor_sharing=SIMILAR (or FORCE) improves plans on skewed columns optimizer_dynamic_sampling=2 (later…) timed_statistics=TRUE (default) statistics_level=TYPICAL (default) On cursor_sharing: BB code uses bind variables (i.e. avoids explicit literal strings in queries) almost all of the time. This allows the CBO to avoid “hard” parses, where each query is parsed and evaluated from scratch, and lets it re-use most of its previous work in determining optimal execution plans. However, sometimes simply re-using the plan from last time isn’t appropriate, e.g. if the last time we queried with a rare literal value as selector an index-based access was good, but if this time we’re using a very common literal value as selector, a full table scan would be better. Cursor_sharing=SIMILAR lets the CBO peek at the literal value and determine based on table/column statistics whether this is a very common or rare value, and what plan should be used depends on that. So a hard parse may be executed to choose a new, optimal plan due to a literal value on a skewed column, i.e. one with certain values occurring very (in)frequently. If using optimizer_features_enable to troubleshoot the CBO by temporarily limiting its choices, don’t relax when you’ve made things work – you need to bring this back up and resolve the root cause, or it’ll be sure to come back to bite you elsewhere! As for the SGA/PGA advisories, collected system statistics are also useful for tuning your optimizer, since there are various performance views that allow you to make educated guesses for realistic settings, assuming you had a typical workload while these were collected. So don’t look at something like the average_wait in v$system_event if you’ve never restarted your DB and had wildly varying usage, but use it to monitor the effects of changing a parameter after restarting the DB and running under normal use for a few hours. Ref. “The Search for Intelligent Life in the CBO” for details.

More than meets the eye 9i/10g manage PGA/SGA automatically
…but legacy parameters can break this! Are your current parameters defaults? V$PARAMETER.isdefault, V$SPPARAMETER.isspecified Check whether you set something you don’t really need, e.g. hash_area_size Run your DB with spfile, not pfile Don’t just take an 8i init.ora and run your 9i DB with it. Create an spfile specifically for 9i and only set what you know you want. Don’t set legacy/deprecated/obsoleted parameters such as buffer_pool_size. Use advisory views to guide you.

The Importance of Database-Statistics
What you don’t know will cost you – in the form of full table scans © Blackboard, Inc. All rights reserved.

What are database statistics?
Physical info about your DB, e.g. number of rows, index depth, number of leaf blocks in index tree etc. Helps CBO determine how many rows to expect from a (sub-)query Includes histogram data to judge unbalanced distribution, e.g. uncommonly frequent values

Status Quo: analyze_my
Blackboard installs a stats gathering job via its own analyze_my package Uses ANALYZE table method Job runs analyzes tables only In Oracle 9i+, job breaks w. default permissions, since analyze_my needs access to V_$PARAMETER in SYS schema SELECT what, broken, failures, schema_user, last_date FROM dba_jobs;

Granting Dictionary Access
Oracle8i: no issue: o7_dictionary_accessibility is on Can run 9i with this same init parameter (gives all DB users select access to all SYS objects) Or: GRANT SELECT ANY DICTIONARY TO BB_BB60; then same command for BBADMIN, BB_BB60_STATS (does same thing for Blackboard users only) Or: GRANT SELECT_CATALOG_ROLE TO …; GRANT SELECT ON V_$PARAMETER TO …; (most specific option, lets Blackboard users access SYS objects in interactive sessions, and the one object needed for analyze_my within stored procedures, where roles don’t work)

Fixing the broken jobs After granting necessary privileges, need to recompile packages ALTER PACKAGE bb_bb60.analyze_my COMPILE; (etc.) Now job is ready to be re-scheduled, need to do that as each Blackboard user, e.g. connect as bb_bb60, then run EXEC DBMS_JOB.BROKEN(job#,FALSE); May also need to re-SUBMIT it for tomorrow Find job# by querying user_jobs

ANALYZE is OUT DBMS_STATS preferred over ANALYZE
New approach: GATHER_SCHEMA_STATS( ownname=>’BB_BB60’, cascade=>TRUE, method_opt=>’FOR ALL INDEXED COLUMNS SIZE AUTO’); cascade analyzes indexes, method_opt controls histogram generation, size auto means get detailed histograms only on columns we actually need them for (see sys.col_usage$ view) Sys.col_usage$ records what columns where ever queried against as selector columns for a query, i.e. which columns are candidates for histograms. It’s cumulative, not db-restart-sensitive.

Scheduling DBMS_STATS call
Schedule weekly for Sunday morning Will likely replace analyze_my in the future Makes huge difference for skewed columns due to histogram generation Can be scheduled as DBA or as schema owner, does not depend on DB users having data dictionary access Can augment with (bi-)nightly GATHER_SCHEMA_STATS( …, options=>’GATHER STALE’) to replace nightly analyze_my runs GATHER STALE requires that tables are monitored via EXEC DBMS_STATS.ALTER_SCHEMA_TAB_MONITORING(’BB_BB60’); Table monitoring is also needed for ‘SIZE AUTO’ clause Note there’s a doc bug – the documentation refers to non-existing procedure ALTER_SCHEMA_TABLE_MONITORING What columns are actually used as query conditions is recorded in SYS.COL_USAGE$ if table monitoring is on. ‘SIZE AUTO’ gathers histograms for those used columns only. We can then use DBMS_STATS.GATHER_SCHEMA_STATS(..., options->’GATHER STALE’) to only gather stats on tables that had significant changes since last stats gathering.

Histograms? Say what? They tell CBO about “typical” data values
e.g. ‘Y’ and ‘N’ should be equally likely, but look at layout.default_ind => only ~1 row has default_ind=‘Y’, rest is ‘N’ As skewed as it gets! CBO needs histogram to choose index when looking for default_ind=‘Y’, i.e. when looking for default portal layout (most common case!) Note: histograms are only beneficial on skewed columns, mostly on indexed ones Each customizable tab has one default layout, so the number of rows in table LAYOUT equals the number of customizable tabs on your portal. The number of actual custom layouts is very large in comparison, potentially #users * #tabs. Histograms are most beneficial on indexed columns, because if there’s no index, there’s little benefit to us if we know that the data distribution is skewed – we just have no index to go by when looking at this column. However they can also be useful to let the CBO know how many rows to expect when filtering by a query condition – they can affect the order of execution significantly, e.g. ATTEMPT.LATEST_IND=‘Y’ is much more common than ‘N’, so filtering by it is not very useful. Quiz question: why would one never collect histograms on a primary key? (Unique index means every value is selective, and the CBO knows that too.)

Why is the DB so busy? Look at v$session_longops to see what’s keeping the DB busy Perf report has memory hogs, I/O hogs Get explain plans for those queries Analyze why CBO chooses full table scans when it shouldn’t Does GATHER_SYSTEM_STATS fix it?

Are you missing statistics?
Stats gathering jobs might be broken Check last_analyzed in user_tables, user_indexes Use the optimizer_dynamic_sampling=2 init param; protects against missing stats; does fast dynamic sampling in memory when stats are missing on the tables involved in query; also gathers dynamic stats on temporary tables analyze table(s) on the fly, does that fix it? Do you have histograms? Check user_histograms --for which tables and columns have I actually computed non-default histograms, i.e. those with more than two buckets? SQL> select distinct table_name||’.’||column_name from user_tab_histograms where endpoint_number>2; Before you first run the new stats gathering job, this will return nothing. --where do we get a LOT of histogram buckets? (Note: this alone doesn’t mean anything about selectivity, it’s more about how many distinct values occur in the table.) SQL> select table_name||'.'||column_name||':'||count(*) from user_tab_histograms group by table_name,column_name having count(*)>100 order by count(*) desc; TABLE_NAME||'.'||COLUMN_NAME|| ACTIVITY_ACCUMULATOR.TIMESTAMP:201 COURSE_CONTENTS.PARENT_PK1:200 COURSE_NAVIGATION_ITEM.INTERNAL_HANDLE:199 X_QTI_ASI_DATA.ANCESTOR_PK1:198 COURSE_USERS.USERS_PK1:197 X_QTI_RESULT_DATA.ANCESTOR_PK1:196 CNV_INVALID_CHARACTERS.PK1:193 USERS. 181 MSG_MAIN.FORUMMAIN_PK1:179 GRADEBOOK_MAIN.GRADEBOOK_TYPE_PK1:178 GRADEBOOK_MAIN.GRADEBOOK_TRANSLATOR_PK1:176 X_COURSE_CONTENTS.ANCESTOR_PK1:174 GRADEBOOK_MAIN.CRSMAIN_PK1:173 MSG_MAIN.USERS_PK1:168 X_MSG_MAIN.ANCESTOR_PK1:158 GRADEBOOK_GRADE.GRADEBOOK_MAIN_PK1:156 COURSE_CONTENTS.CRSMAIN_PK1:124 ACTIVITY_ACCUMULATOR.COURSE_PK1:122 QTI_ASI_DATA.POSITION:101 19 rows selected.

Explain Plan @$ORACLE_HOME/rdbms/admin/utlxplan creates PLAN_TABLE
@.../utlxpls queries it after an EXPLAIN EXPLAIN PLAN FOR query Or use AUTOTRACE for automatic explaining of all your session’s queries

Tools for more CBO info Statspack tkprof 10053 trace
Beyond the scope of this session

How to run Perf Report tools/perf_reports/run_reports.sh (run as root or bbuser) output in logs/perf_reports/ depends on: GRANT SELECT ANY DICTIONARY TO BBADMIN Don’t run *often*, don’t run 2 in parallel Tuning advice based on workload since DB startup, should be “typical”

Some bugs in Perf Report in 6.3 and 7.0
run_sql_reports.sh: comment out the line setting ORA_NLS33 (to a non-existing path) get_tomcat_trace.sh, line 31: delete that extraneous colon get_os_stats.sh, line 16: delete “/usr/ucb/” for Solaris These don’t apply to our topic though Can also run queries in perf_reports/sql manually from sqlplus, spool output to file

Ask Tom – and RTFM Reference manuals are your friend
Oracle makes full set of manuals available for free download Learn more at Expert One-on-One Oracle (T.Kyte) Effective Oracle by Design (T.Kyte) Expert Oracle Database Architecture (Kyte)

Useful links J.M. Hunter excellent links & plenty of good articles Jonathan Lewis Howard Rogers Steve Adams Beware of self-pronounced experts that don’t document results Technical forums can help & hurt Assorted useful articles and reference material, in no particular order: Simon Sheppard: Init.ora parameter quick overview Roger Schrag, Database Specialists: Using Explain Plan and Tkprof to tune your applications: Wolfgang Breitling, Centrex Corporation: A Look under the hood of CBO: The Event What is new in the CBO and the event trace in Oracle 9i Fallacies of the Cost Based Optimizer Bjoern Ensig, Miracle A/S: Bind Variables And Cursor Sharing - New Directions In Oracle 9i Statspack: Quickly Identify your worst performance problem Tom Kyte, Oracle Corp: Beta chapters 4&5 from Expert Oracle10g Edition: How to get AUTOTRACE working: Brian Peasland, Oracle Pipeline: Tuning PGA_AGGREGATE_TARGET in Oracle 9i Tim Gorman, SageLogix: The Search For Intelligent Life in the Cost-Based Optimizer (8i)

Wrapping up Give the optimizer accurate info, and it will find the right plans Correct statistics are the foundation Monitoring SGA needs is secondary! Watch for outdated settings Blackboard Support helps with catastrophic problems only, Consulting Services does fine-tuning

Tuning Oracle for Blackboard

Similar presentations

Presentation on theme: "Tuning Oracle for Blackboard"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Tuning Oracle for Blackboard

Similar presentations

Presentation on theme: "Tuning Oracle for Blackboard"— Presentation transcript:

Similar presentations

About project

Feedback