Presentation on theme: "Washington Area Informix/DB2 Users Group July 26, 2006 DB2 Performance Tuning: A Practical Approach By Jim Cleveland Bluepoint Consulting."— Presentation transcript:
Washington Area Informix/DB2 Users Group July 26, 2006 DB2 Performance Tuning: A Practical Approach By Jim Cleveland Bluepoint Consulting Inc.
Introduction What is our aim? –Performance tuning is: Analyzing the systems configuration Analyzing the systems configuration Observing performance characteristics Observing performance characteristics Formulating recommendations intended to enhance performance Formulating recommendations intended to enhance performance (Rinse, shampoo, repeat) (Rinse, shampoo, repeat) –Metrics driven model involves: Gathering system performance data (at all levels) Gathering system performance data (at all levels) Focusing on a particular quantifiable performance measurement Focusing on a particular quantifiable performance measurement Adjusting performance parameters or modifying architectural characteristics Adjusting performance parameters or modifying architectural characteristics Gauging the effect of those changes on overall performance and selected measurement Gauging the effect of those changes on overall performance and selected measurement
The Diagnostic Tools – DB2 DB2 Snapshot Monitors –Internal counters set at global or session levels using monitor switches –Monitors collect cumulative statistics either from the initiation of a CLP session (switches and stats are local to session), since time of last reboot (global switches set and stats available to all users), or last reset monitors command at either level –Useful for collecting point-in-time information regarding performance metrics with respect to overall db/dbm behavior. Also able to provide more specific information on: Locking – at given moment lists all locks & types held by all applications Locking – at given moment lists all locks & types held by all applications Bufferpools – cumulative stats for memory, physical & logical I/Os, synch/asych Bufferpools – cumulative stats for memory, physical & logical I/Os, synch/asych Sorts – provides detailed statistics regard sortheap usage, overflows, # active etc. Sorts – provides detailed statistics regard sortheap usage, overflows, # active etc. Tablespaces – detailed I/O stats and other activity for a given tablespace Tablespaces – detailed I/O stats and other activity for a given tablespace UOW – displays status of application Unit of Work at point-in-time UOW – displays status of application Unit of Work at point-in-time Dynamic SQL – shows contents of package cache and related stats at point-in-time Dynamic SQL – shows contents of package cache and related stats at point-in-time
The Diagnostic Tools – DB2 DB2 Event Monitors Event Monitors are the DB2 facility used to collect information regarding a specific event or chain of events within the context of database operations. –They are event driven versus continuously collected as with snapshots –Must be explicitly created and activated via DB2 commands or APIs –Are the best way to effectively diagnose and resolve deadlock issues from the database perspective, since deadlocks require tracing a chain of events through time –Output may be directed to a DB2 table and results then analyzed using SQL. For example, output from statement Event Monitor can be used to: identify and rank most frequently executed or most time consuming SQL identify and rank most frequently executed or most time consuming SQL track the sequence of statements leading up to a lock timeout or deadlock track the sequence of statements leading up to a lock timeout or deadlock track connections to the database track connections to the database breakdown overall SQL statement execution time into sub-operations such as sort time and use or system CPU time breakdown overall SQL statement execution time into sub-operations such as sort time and use or system CPU time
The Diagnostic Tools - Unix Concurrent Monitoring of OS Resource Usage is Imperative. When collecting DB2 snapshot/event data it will be useful (necessary even?) to have, at a minimum, time stamped output from: –iostat [interval in sec. #iterations] provides detailed information, broken down by hdisk for % of time disk busy, read/write transfer rates and totals, xacts per sec. Particularly useful for identifying heavily used (hot) disks which may be the cause I/O bottlenecks (i.e. – hdisks with continuous activity > 40% deserve attention) –vmstat [interval in sec. #iterations] provides information regarding CPU activity (% for system, user, and time waiting for I/O), available/free memory and paging activity. Useful for identifying shortage of CPU resources - overall activity > 70% indicates there may be inadequate headroom for handling peak loads DB2 processes may back up in wait queue. –Other useful OS performance monitoring/config commands: filemon - drill down to retrieve detailed I/O statistics on individual devices filemon - drill down to retrieve detailed I/O statistics on individual devices netstat - identify latency in transactions due to network data transfer time netstat - identify latency in transactions due to network data transfer time ps - provides capability to identify DB2 (or other) processes ps - provides capability to identify DB2 (or other) processes sar - system activity report, combines metrics from iostat, vmstat, and others sar - system activity report, combines metrics from iostat, vmstat, and others lsps – describes paging space lsps – describes paging space lscfg/smit – system hardware inventory command and configuration utilitiy lscfg/smit – system hardware inventory command and configuration utilitiy
The Diagnostic Tools – Unix/AIX In the best of all possible worlds...get topas installed (or nmon):
Resort to bribery….
The DB2 Process Model To understand the context in which performance issues arise we need to understand how the various DB2 components function together as a whole. That is, how the DB2 process model works –
Identifying the Problem – Step I Collect Preliminary Configuration and Performance Information –Do a cursory configuration and architecture review (beforehand, if possible) –Capturing db and dbm level snapshot stats over a representative period of time provides a good jumping off/drill-down point for non-specific performance issues (i.e. – focus is not on a single transaction or process). –Simple shell script run from Unix prompt or scheduled using crontab will suffice to collect data over specified period (sub # of iterations for i1, interval len for i2): For example: #!/bin/ksh date; x=o; db2 update monitor switches using lock sort on bufferpool on uow on table on statement on; db2 reset monitor all do while x < i1; # determines number of iterations, total monitoring period date db2 get snapshot for dbm; db2 get snapshot for db on xxxxxx; sleep i2 # interval in seconds for loop, longer interval for longer monitoring period x=x+1enddo As mentioned cross time indexed OS level stats should be collected concurrently – same script using iostat/vmstat maybe.
Identifying the Problem – Step II The question at this stage is: on what tasks and in what proportion is DB2 spending its time? To identify common performance issues, breakdown as follows: –Locking - time waiting on resolution of lock & deadlock issues –Sorts - time spent performing sorts, hash joins, (other?) –Logging Activity - time for log writes, reads and overhead –System CPU - time executing system calls –User CPU - time spent on SQL prep, ovhd for locking and sorting activity –General I/O - no single cause, likely a confluence of factors for e.g. ts design, container mapping, bp configuration, db/dbm params etc. (Specific calculations for above available in Admin Guide: Performance)
The Tuning Process – a Roadmap
Sort Issues SQL containing GROUP BY, ORDER BY, or DISTINCT will result in a sort if no index is available to order the result set. Also in prep for merge join. To identify sort problems, look for the following: –Sort overflows are nominally < 3% (Total Overflows/Total Sorts * 100) –More than minimal post-threshold sorts or rejected piped sorts –Sort heap high water mark > SHEAPTHRESH (post-threshold sorts) –The # of active sorts at a given time is greater than x2 ( or 3 σ) baseline –Any large or significant small hash join overflows, any hash loops –Rows selected >> rows will generally causes ^ sorts need for indexes –> 5 sorts per statement (Total sorts/Total Statements) Problem SQL
Adjusting Sort Parameters Increasing Sort Heap/Threshold (remember this is treating the symptom!): –may allow for larger in memory sorts preventing disk spills –fewer sort passes/merge joins and may also cut down overall exec time –same benefits for hash joins, dynamic bit index ANDing, & star joins Caveats: large sort memory may cause excessive paging, in memory sort rather than index scan, and sort space may come at the expense of bufferpool space Rule Of Thumb: –Calculate average sort space = (sort_heap allocated/avg. # active sorts) use as baseline sortheap then adjust to minimize overflows, pipe rejects, hash overflows –Sheapthresh = max(avg # appls connected, avg. active sorts) * max(sortheap among dbs in instance) adjust to minimize post-thresh sorts, hash overflows etc.
Logging Issues Transaction logging and overhead can represent a large part of I/O time in OLTP/mixed environments. To identify logging problems look for: –Hot (> 40%) hdisks where logs located (use db2 snaps for raw devices) –Snapshots showing frequent use of secondary logs, large volume of writes/reads to logs, log high watermark approached, ^ opens/closes writes/reads to logs, log high watermark approached, ^ opens/closes –Logging in Non-OLTP environment? Check db2diag for ROLLBACKs Potential adjustments/remedies : –Increase size & # of primary logs, SOFTMAX, MINCOMMIT, LOGBUFSIZE –Relocate logs on faster devices? –Design options: compound SQL, group > 5 statements into Stored Proc.
Prefetch Wait Time – Refining the Cause To determine why prefetch is not happening effectively examine (for both tablespaces and individual tables): Asynch Read Pct: Asynch Reads/Buffer Pool Physical Reads Asynch Read Pct: Asynch Reads/Buffer Pool Physical Reads Asynch Read Ratio: Asynch Pages Read/Logical Reads Asynch Read Ratio: Asynch Pages Read/Logical Reads Asynch Pages Per Req (APPR): Asynch Pages Read/Asynch Requests Asynch Pages Per Req (APPR): Asynch Pages Read/Asynch Requests Asynch Read Time: Asynch Read time in ms/Overall Read Time ms Asynch Read Time: Asynch Read time in ms/Overall Read Time ms Number of unread prefetch pages Number of unread prefetch pages Is wait caused by an underlying I/O bandwidth limitation? Is wait caused by an underlying I/O bandwidth limitation? Bufferpool configuration Bufferpool configuration Purpose of calculations is to determine: Is prefetch taking place on correct tables/tablespaces? Is prefetch taking place on correct tables/tablespaces? In large enough chunks? In large enough chunks? In proper proportion to overall logical and synchronous I/O? In proper proportion to overall logical and synchronous I/O? Without being overwritten by other pfetchers? Without being overwritten by other pfetchers?
Excessive Prefetch Time – Possible Remedies Check db/dbm configuration and architecture: –DB2_PARALLEL_IO=* regvar, SEQDETECT=Y, # of IO_SERVERS –Physical parameters - page/extent/prefetch sized/RAID strip size Pfetch size = n * [# devices in array] * extent size where n Є 1,2,3… Incremental improvement may be achieved by: –Implementing Big-Block I/O –Adjust the number and size of prefetch queues –bps only need to be sized to provide for effective pfetch –Need separate bps for tables w/high ARRs or contention will occur –Table/jfs reorganization - Pre-fetch advantage is offset by fragmented DB2/OS data check contiguity at OS level and w/DB2 REORGCHK –I/O bandwidth limiting factor? Adequate parallelism with existing devices? Solution: map to more containers or more/faster devices. devices? Solution: map to more containers or more/faster devices.
Detection & Classifying of Locking Issues To detect and diagnose locking problems, look for: Snapshots show significant Time Spent Waiting on Locks (compared to overall execution time) Snapshots show significant Time Spent Waiting on Locks (compared to overall execution time) Snapshots show lock escalations Snapshots show lock escalations Locklist High Water Mark > 50% of total space Locklist High Water Mark > 50% of total space db2diag.log shows repeated SQL0911N (rc 1 or 2) meaning Lock timeouts or deadlocks detected and resolved db2diag.log shows repeated SQL0911N (rc 1 or 2) meaning Lock timeouts or deadlocks detected and resolved Look for ROLLBACKs in diaglog as well deadlocks Look for ROLLBACKs in diaglog as well deadlocks
Locking Problem Mitigation Locking is fundamentally an application logic issue. That said, recourse on db side includes: –Insure LOCKTIMEOUT changed from default -1 to e.g –Set LOCKTABLE for heavily updated tables –Provide sufficient LOCKLIST space to prevent lock escalation –Tune db parameters MAXLOCKS, LOCKTIMEOUT & DLCHKTIME –Set registry variables LOCK_TO_RB, KEEPTABLELOCK, MAX_NON_TABLE_LOCKS, DB2_EVALUNCOMMITTED, DB2_SKIPDELETED and DB2_SKIPINSERTED –Type II indexes ^ concurrency if migrated from v7, Use REORG to convert Application development principles: –Frequent of COMMITs (use event monitor or db2diag.log to track) –Specifying FOR FETCH cursors whenever possible –Lowest isolation level for entire application or single statement
System or User CPU User CPU is often a result of waiting on locks or sorts. If thats still the case after after previous steps, application design consult is required. Barring that, the need is to identify processes or particular SQL statement consuming CPU cycles: –Use ps -elf to identify processes at OS level –List applications show detail for connections within DB2 –Event monitors to isolate apps/SQL using extensive CPU resources Stack trace, explained in the DB2 Problem Determination Guides, is the ultimate means of identifying process (and time) flow within DB2.
Paging Issues – AIX VMM allows system to assign more memory to a process than physically exists. Properly tuned DB2 system on dedicated server should incur NO paging whatsoever. Page faults/sec > 50 or > pi/po problem. Paging means memory is overcommitted – common sources in 32-bit envirnoments: -Over allocation of bufferpools or sortheap -Database memory allocated based on avg. # applications -Private memory allocated by large number of connections Remedies: -Identify and release overcommitted memory (use db2mtrk/visualizer) -AIX minperm and maxperm parameters -Paging space on disk x2 physical memory -Pin bufferpools in memory using DB2_PINNED_BP -Release agent private memory using DB2 regvars MEMDISCLAIM/MAXFREE -Remember that Intra-parallelism and FCM take up 2 segments (i.e. – less for BPs)
Whatever is left is I/O, right? Sorta-kinda-maybe. I/O efficiency can be addressed within context of existing configuration But, later well need consider performance drivers inherent in design. Checklist of common opportunities for improving I/O performance: Parallel I/O – Registry variables, num_ioservers, adequate # containers (hdisks) Parallel I/O – Registry variables, num_ioservers, adequate # containers (hdisks) Turn off Memory Mapped I/O (MMAP_R/W) but beware of i-node latching Turn off Memory Mapped I/O (MMAP_R/W) but beware of i-node latching Consider DMS RAW DEVICE containers. DMS file are worst of both worlds Consider DMS RAW DEVICE containers. DMS file are worst of both worlds Examine behavior of HASH JOINs – may degrade performance wo/resources Examine behavior of HASH JOINs – may degrade performance wo/resources RUNSTATS and REORGs – automate! Dont leave to discretion of overtaxed DBA! RUNSTATS and REORGs – automate! Dont leave to discretion of overtaxed DBA! OLTP specifics – consider optimization 2/3, AVG_APPLS = 1, watch logging closely, OLTP specifics – consider optimization 2/3, AVG_APPLS = 1, watch logging closely, CHNGPGS_THRESH 20 – 30%, smaller pages more efficient I/O, MINFREE and CHNGPGS_THRESH 20 – 30%, smaller pages more efficient I/O, MINFREE and PCTFREE to avoid block splitting/overflows PCTFREE to avoid block splitting/overflows DSS specifics – consider INTRAPARALLEL, optimization > 5? AA=connections, DSS specifics – consider INTRAPARALLEL, optimization > 5? AA=connections, watch sorts closesly, 32 K pages? (but remember only 255 rows per watch sorts closesly, 32 K pages? (but remember only 255 rows per page), DLCHKTIME ^, large result sets - rqioblock to 64 K page), DLCHKTIME ^, large result sets - rqioblock to 64 K
Laundry List of I/O issues (continued) ESS: stripe size is 64 K, stripe ts containers across LUNs, keep in mind LUNs can be on same rank – (same array), do not duplicate RAID at OS FAStT: stripe size configurable from 8 – 256 K, multiple RAID configs Less prevalent, but no less debilitating architectural/configuration issues: –Page size correct for row size and access method? –Unless there is a distinct advantage, (synch/asych I/O) fewer bufferpools –Benchmark index/data separation – not always advantageous –Pay attention to victim page steals – adjust # of cleaner/threshold –CPU Speed -1, let optimizer decide –File open/close overhead MAXFILEOPEN/ulimits settings
AIX Recs - The good, the bad & the ugly General settings, rules, suggestions, incantations: –Use AIX 5.2 with latest maintenance release if possible –Use 64 bit DB2 with 64 bit kernel, 32 bit / 32 bit, recompile & rebind SPs etc. when migration to 64 bit takes place, point to 64 bit libraries SPs etc. when migration to 64 bit takes place, point to 64 bit libraries –Steady state load of hardware config (CPU, Memory, I/O) should use no more than 70% of available resources to allow for peak loads/unseen overhead more than 70% of available resources to allow for peak loads/unseen overhead –Number of licensed users and maxuproc set to max DB2 connections –Maxperm/minperm settings along with DB2 reg vars for memory disclaim –Maxmemory/minmemory to regulate AIX –Ulimits – maxprocs increased, unlimited for other params –Make sure asynchronous I/O (aio) is enabled w/adequate # of agents –Paging space at least x2 physical memory –If pinning memory using regvar, vmtune tuning parameter ? –Disabling MMAP reads/writes in DB2 means jfs cache bybassed…can provide significant I/O improvement, but i-node latches 3 containers –Make sure all processors in an SMP environment are enabled! Ask about WLM and DLPAR settings. –Check in smit for LP/PP ratio – is there disk mirroring on top of RAID-5? –db2oscnfg – makes recs for OS kernel settings in HP-UX and Solaris
SQL Analysis & Tips The Golden Rule for Join Performance : –Columns used to join tables or that are used in a group by, order by or distinct clause should be indexed to improve performance Other useful SQL rules: –Optimizer is all about SARGABLE predicates (stage 1, indexable, –For join with N tables, there must be N-1 relationships defined to avoid a Cartesian product -- Corollary: establishing higher selectivity early in plan. –All SQL optimization efforts marginalized with outdated statistics or fragged data –Use optimize/fetch only rows clause if possible –Test alternate optimization classes for problem SQL (i.e. – 5, 7 & 9) –Use singleton select instead of cursor – use routines to identify node in EEE, make cursors READ ONLY where possible non-deletable no exclusive locks, BLOCKING ALL option also –Use UR or lowest possible isolation level when possible to avoid waiting on locks –Consider Global Temporary tables – scanning workfile faster sometimes than subquery –Run logic as function instead of procedure