Presentation is loading. Please wait.

Presentation is loading. Please wait.

How STATSPACK Was Used to Solve Common Performance Issues Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems

Similar presentations


Presentation on theme: "How STATSPACK Was Used to Solve Common Performance Issues Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems"— Presentation transcript:

1 www.brianhitchcock.net

2 How STATSPACK Was Used to Solve Common Performance Issues Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems brian.hitchcock@sun.com Session id: 31885 Dedicated to Pramitha Chowrira, the Goddess of the Rockies, Mike Waldron, the Student who became the Master, Sheryl Driscoll, who leads purely for the glory and Ann Bischoff, who took me in trade for a developer... www.brianhitchcock.net

3 What STATSPACK Is  Set of SQL and PL/SQL  Collects performance data from v$ tables  Stores collected data in separate tables  Each collection of data is a ‘snapshot’  Reports deltas in data between snapshots  Supports ad-hoc SQL queries of the snapshot data

4 www.brianhitchcock.net STATSPACK Details  Works for 8.1.7 onwards  Gathers data for a single instance  Snapshot levels – Determine how much data is collected – Defaults are fine  Snapshot interval of 15 minutes suggested  Long report periods miss transient events – Reports over an instance restart are not valid

5 www.brianhitchcock.net STATSPACK -- Good  Free (very cool!)  Gathers a wide range of data – You don’t know what you’re looking for at first – Root cause isn’t usually obvious  Standard process to collect performance data – Gathers the same data on all instances – Easy to share with vendors, support groups

6 www.brianhitchcock.net STATSPACK -- Not Perfect  Gathers a wide range of data – Ocean of data – Any information? – Easy to get lost  Does not tell you what the problem is – Shows what is happening in the instance – You need to figure out if this is a problem or not  Does not tell you the solution  Does not tell you that you are done tuning...

7 www.brianhitchcock.net How to Interpret Output?  Requires experience with your system – No single way to analyze output – Must have history of your system – Look for possible problem areas – Trial and error to change problem behavior – Only you can tell if you have a performance problem

8 www.brianhitchcock.net STATSPACK Report Sections – Instance Summary, Efficiency – Top 5 Wait Events, Wait Events – SQL Ordered by Gets, Reads, Executions – Instance Activity Stats – Tablespace IO Stats Ordered by IOs, Tblspc-file – Buffer Pool Statistics – Rollback Segment Stats, Storage – Latch Activity, Sleep, Miss Sources – Dictionary Cache Stats – Library Cache Activity – SGA Memory Summary, Breakdown – init.ora Parameters

9 www.brianhitchcock.net Documentation of output  No comprehensive documentation – Oracle 8i Reference  Appendix A -- Wait Events defined  Appendix B -- Enqueue Names defined  Appendix C -- Statistics Descriptions – Database Performance Guide and Ref 9.0.1  Chapters 21-23, Supplied packages, how to use – $ORACLE_HOME/rdbms/admin/spdoc.txt – ORACLE High-Performance Tuning with STATSPACK, Donald K. Burleson, Oracle Press ISBN 0-07-213378-3  No explanation of – What output means for your system

10 www.brianhitchcock.net Configuration Used  Oracle 8.1.7.2  Snapshots every 15 minutes – snapshots taken continuously  Default STATSPACK snapshot ‘level’  Application loads and analyzes web site click stream data – Lots of data – More data all the time – We don’t know what vendor code looks like

11 www.brianhitchcock.net Actual Use  4 Performance issues in 2002 – Case 1) Reports Running Slow  STATSPACK output didn’t show the problem – Case 2) Vendor Demo Slow  STATSPACK output didn’t show the problem – Case 3) Data Load Slow  STATSPACK output led to 18x speedup (1800%) – Case 4) Data Load Time Varies  STATSPACK output led to the root cause

12 www.brianhitchcock.net Case 1) Reports Running Slow  Vendor code allows users to setup reports – Vendor code generates SQL for report  Long run interferes with next day’s data load  STATSPACK captures SQL – Generate explain plan(s)  Report SQL doesn’t generate where clause properly to use partition pruning  Vendor refuses to change their code  We simply removed the reports – Performance issue ‘resolved’

13 www.brianhitchcock.net Case 2) Vendor Demo Slow  Due to issues like Case 1) – New vendor sets up demo, data load slow – Data load runs twice as fast at vendor – Statspack output doesn’t show anything obvious – Compare configuration of vendor and our dbs – Vendor has only one redo log file per group – We had two redo log files per group – We drop one file per group, performance issue resolved

14 www.brianhitchcock.net Case 3) Data Load Slow  First time loading new type of web log data  No baseline to compare with – Classical performance tuning doesn’t always apply to the real world  Data load so slow no time for daily reporting  Must run faster or the data won’t be loaded  We don’t know if this load will run faster  Do we have a ‘performance’ issue? – Yes, data load must run faster to be useful – No, perhaps this is as fast as it can be...

15 www.brianhitchcock.net Case 3) Data Load Slow  SQL -- Highest Gets per Exec SQL ordered by Gets for DB: BHDATA04 Instance: BHDATA04 Snaps: 4426 -4427 -> End Buffer Gets Threshold: 10000 -> Note that resources reported for PL/SQL includes the resources used by all SQL statements called within the PL/SQL code. As individual SQL statements are also reported, it is possible and valid for the summed total % to exceed 100 Buffer Gets Executions Gets per Exec % Total Hash Value --------------- ------------ -------------- ------- ------------ 634,304 68 9,328.0 97.5 3948948039 SELECT t526.keyvalueid FROM bh_lqueryvalue t526 WHERE t526.query infoid = :ph0 ORDER BY t526.keyvalueid ASC

16 www.brianhitchcock.net Bad SQL?  SQL shouldn’t cost much – Select looking for one row – Table has two indexes – Explain plan shows ‘index full scan’ – Should show ‘index range scan’ – Explain plan with hint to force one index – Verify cost of each index – Optimizer is choosing wrong index!  Drop the costly index! – Indexes added by vendor ‘to be safe’...

17 www.brianhitchcock.net Explain Plan Force Index1 SQL> truncate table plan_table; Table truncated. SQL> explain plan set Statement_Id = 'TEST' for SELECT /*+ INDEX(t526 X_LQRYVL_QUERYIDKYVL) */ t526.keyvalueid FROM bh_lqueryvalue t526 WHERE t526.queryinfoid = 100 ORDER BY t526.keyvalueid ASC; 2 3 4 5 Explained. SQL> @$ORACLE_HOME/rdbms/admin/utlxpls Plan Table -------------------------------------------------------------------------------- | Operation | Name | Rows | Bytes| Cost | Pstart| Pstop | -------------------------------------------------------------------------------- | SELECT STATEMENT | | 3 | 24 | 1 | | | | INDEX RANGE SCAN |X_LQRYVL_ | 3 | 24 | 3 | | | --------------------------------------------------------------------------------  Cost 4

18 www.brianhitchcock.net Explain Plan Force Index2 SQL> explain plan set Statement_Id = 'TEST' for SELECT /*+ INDEX(t526 X_LQYVAL_KYVLQRYID) */ t526.keyvalueid FROM bh_lqueryvalue t526 WHERE t526.queryinfoid = 100 ORDER BY t526.keyvalueid ASC; 2 3 4 5 Explained. SQL> @$ORACLE_HOME/rdbms/admin/utlxpls Plan Table -------------------------------------------------------------------------------- | Operation | Name | Rows | Bytes| Cost | Pstart| Pstop | -------------------------------------------------------------------------------- | SELECT STATEMENT | | 3 | 24 | 2334 | | | | INDEX FULL SCAN |X_LQYVAL_ | 3 | 24 | 9333 | | | -------------------------------------------------------------------------------- SQL>  Cost 11,667

19 www.brianhitchcock.net Solution  After dropping costly index – data load time was 18 hours, became 1 hour – 18:1 improvement (1800%)  Why did optimizer choose wrong index? – No idea, Oracle requested running 18 hour data load to gather instance data – Business users said “NO!” – Indexes created by vendor, no need for both indexes  Know when to quit tuning!

20 www.brianhitchcock.net What About Wait Events?  Popular DBAs – Wait Events are all that matters  I want (desperately) to be popular too… – Return to Case 3) – Examine Top 5 Wait Events section – Try to understand what is causing the wait time

21 www.brianhitchcock.net Case 3) Wait Events  Top 5 Wait Events Top 5 Wait Events ~~~~~~~~~~~~~~~~~ Wait % Total Event Waits Time (cs) Wt Time -------------------------------------------- ------------ ------------ ------- PX Deq: Execution Msg 335 1,156 45.30 latch free 1,740 522 20.45 control file parallel write 88 329 12.89 db file sequential read 253 323 12.66 log file parallel write 2 42 1.65 -------------------------------------------------------------  PX -- parallel query issues? – Contact Oracle Tech Support

22 www.brianhitchcock.net Ask the Experts  Oracle Tech Support – Many wait events in STATSPACK output should be ignored (a bug perhaps? Or an RFE?) – Requests the full STATSPACK report – Tells me that the latch free wait event must be addressed – set session_cached_cursors = 100 – Performance improvement will be ‘significant’ – Data load now takes 19 hours (about 10% worse)

23 www.brianhitchcock.net What Happened?  How could the experts miss the bad SQL?  Wait events are important – If the total wait time is the largest problem  In this case – Bad SQL dominated the overall run time  Wait event analysis – Many events should be ignored – Need to determine how much of total run time is due to wait events only  Go back and fix the SQL issue

24 www.brianhitchcock.net Total CPU Time  Total CPU time is 27475 10s of milliseconds – 10 milliseconds is 1 centisecond (cs) = 0.01 sec – 27457 --> 27475 cs = 274.75 seconds – Report interval was 4.60 minutes (276 seconds) – confused? How many cs left until Happy Hour?  From Instance Activity Stats Instance Activity Stats for DB: BHDATA04 Instance: BHDATA04 Snaps: 4426 -4427 Statistic Total per Second per Trans --------------------------------- ---------------- ------------ ------------ CPU used by this session 27,441 99.4 27,441.0 CPU used when call started 27,475 99.6 27,475.0 …

25 www.brianhitchcock.net Total Wait Time  Look at all Wait Events Wait Events for DB: BHDATA04 Instance: BHDATA04 Snaps: 4426 -4427 -> cs - centisecond - 100th of a second -> ms - millisecond - 1000th of a second -> ordered by wait time desc, waits desc (idle events last) Avg Total Wait wait Waits Event Waits Timeouts Time (cs) (ms) /txn ---------------------------- ------------ ---------- ----------- ------ ------ PX Deq: Execution Msg 335 0 1,156 35 335.0 latch free 1,740 1,046 522 3 ###### control file parallel write 88 0 329 37 88.0 db file sequential read 253 0 323 13 253.0 log file parallel write 2 0 42 210 2.0 enqueue 61 0 40 7 61.0 refresh controlfile command 602 0 37 1 602.0 PX Deq: Msg Fragment 64 0 29 5 64.0 PX Deq: Parse Reply 59 0 26 4 59.0 control file sequential read 2,673 0 16 0 ###### log file sync 1 0 15 150 1.0 PX Deq: Signal ACK 2 2 11 55 2.0 PX Deq: Join ACK 57 0 3 1 57.0 PX Deq: Execute Reply 5 0 3 6 5.0 SQL*Net more data to client 37 0 0 0 37.0 file open 6 0 0 0 6.0 db file parallel write 4 0 0 0 4.0 PX Idle Wait 4,316 4,287 877,818 2034 ###### SQL*Net message from client 2,213 0 27,615 125 ###### SQL*Net message to client 2,214 0 1 0 ###### ------------------------------------------------------------- -----------> Total Wait Time 907997 cs

26 www.brianhitchcock.net Real Total Wait Time  Remove idle events – MetaLink Note:191103.1 PQ Wait Events – STATSPACK report should filter out idle events – Database Performance Guide and Ref 9.0.1  Explains more about this ‘feature’

27 www.brianhitchcock.net Real Total Wait Time  Idle Events Wait Events for DB: BHDATA04 Instance: BHDATA04 Snaps: 4426 -4427 -> cs - centisecond - 100th of a second -> ms - millisecond - 1000th of a second -> ordered by wait time desc, waits desc (idle events last) Avg Total Wait wait Waits Event Waits Timeouts Time (cs) (ms) /txn ---------------------------- ------------ ---------- ----------- ------ ------ PX Deq: Execution Msg 335 0 1,156 35 335.0 <----- remove latch free 1,740 1,046 522 3 ###### control file parallel write 88 0 329 37 88.0 db file sequential read 253 0 323 13 253.0 log file parallel write 2 0 42 210 2.0 enqueue 61 0 40 7 61.0 refresh controlfile command 602 0 37 1 602.0 PX Deq: Msg Fragment 64 0 29 5 64.0 PX Deq: Parse Reply 59 0 26 4 59.0 control file sequential read 2,673 0 16 0 ###### log file sync 1 0 15 150 1.0 PX Deq: Signal ACK 2 2 11 55 2.0 <----- remove PX Deq: Join ACK 57 0 3 1 57.0 PX Deq: Execute Reply 5 0 3 6 5.0 SQL*Net more data to client 37 0 0 0 37.0 file open 6 0 0 0 6.0 db file parallel write 4 0 0 0 4.0 PX Idle Wait 4,316 4,287 877,818 2034 ###### <----- remove SQL*Net message from client 2,213 0 27,615 125 ###### <----- remove SQL*Net message to client 2,214 0 1 0 ###### ------------------------------------------------------------- ----------> Total Wait Time 1397 cs

28 www.brianhitchcock.net Total Response Time  Total CPU Time + Total Wait Time – 27475 cs + 1397 cs = 28872 cs  Total Wait Time – 1397/28872 = 0.05 – 5% of Total Response Time  Wait Time was never an issue!  If you don’t remove the idle events – 907997/(27475+907997) = 97%

29 www.brianhitchcock.net Bad SQL Rules  Slow data load time – Time due to Bad SQL – Time due All Others  Including Wait Events

30 www.brianhitchcock.net Case 4) Data Load Time Varies  Data Load Time – Normal 6.5 hours, Long 16 hours – Varies randomly, no pattern  Generate STATSPACK report – Normal, Long – Compare reports – Look for differences between reports  Tablespace IO Stats Section – Normal -- 11 tablespaces accessed – Long -- 74 tablespaces accessed

31 www.brianhitchcock.net Problem and Solution  Vendor data load shouldn’t touch all tables  What process would access all tables?  Production db supported by another group – We aren’t allowed to connect as ‘oracle’ – Can’t see what they might be running (cron?)  Turns out – Production DBAs decided we needed full exports – We weren’t notified – Stop the exports, performance issue goes away!

32 www.brianhitchcock.net What About Cache Hit Rates?  Back to the subject of experts – Remember when it was cool to discuss hit rates?  For Case 4) – Compute buffer cache hit ratio from tables – Tables larger than physical memory – Can’t have all pages in memory at once – Buffer cache hit ratio won’t be 100%  Even if we had 100% – Bad SQL (index) was the real problem – Buffer cache hit ratio wasn’t relevant

33 www.brianhitchcock.net Select Buffer Cache Hit Ratio  Data Load without Exports running

34 www.brianhitchcock.net Total Wait Time?  For Case 4), 15 minute report interval – Wait Time is 28% of total time Top 5 Wait Events ~~~~~~~~~~~~~~~~~ Wait % Total Event Waits Time (cs) Wt Time --------------------------------------- ------------ ------------ ------- PX Deq: Table Q Normal 495,566 1,017,115 30.12 <-- remove slave wait 25,580 744,985 22.06 PX Deq Credit: send blkd 406,262 678,556 20.10 <-- remove PX Deq: Execution Msg 33,009 438,309 12.98 <-- remove latch free 42,575 134,840 3.99 ------------------------------------------------------------- 879825 cs total time = 2237915 + 879825 = 3117740 Wait time is 879825/3117740 = 28% Instance Activity Stats for DB: BHDATA01 Instance: BHDATA01 Snaps: 13817 -1381 Statistic Total per Second per Trans --------------------------------- ---------------- ------------ ------------ CPU used by this session 1,422,921 1,556.8 154.4 CPU used when call started 2,237,915 2,448.5 242.8 <-----

35 www.brianhitchcock.net Review  For the 4 issues we had, STATSPACK output – Was useful for all 4  Provided standard set of data for all involved – Fixed 2 issues  Provided the data that led to the root cause  Verified that the fix was working  Performance improvements were substantial – Tuning process much faster with STATSPACK – Same process worked for all 4 issues – Decided not to look for further improvements  Wait Time analysis might be useful...

36 www.brianhitchcock.net oraperf.com Analyzer  Website oraperf.com – Submit STATSPACK report – Analyzer reviews report – Generates detailed analysis  CPU time  Wait time – Gives specific advice – Not perfect, but it is fast and free!  Has same issues with idle wait events as STATSPACK report

37 www.brianhitchcock.net oraperf.com  Who or what is oraperf.com? – From the website... Oraperf.com is run by Anjo Kolk. Anjo has worked for over 16 years at Oracle (1985-2001). While at Oracle he worked in different countries and different departments. Many people generate utlbstat/utlestats and statspack reports, but don't know how to interpret the data. People that do look at these are reports also mostly looking at the wrong information and end up making the wrong tuning decisions. That is why the reports are analyzed based on the YAPP method. The YAPP method will show what component of the total response time should be tuned first. YAPP-Method -- Yet Another Performance Profiling Method http://oraperf.com/download/yapp_anjo_kolk.pdf

38 www.brianhitchcock.net oraperf.com -- Case 3)  Upload report from slow data load  Analyzer shows – Response time  91.63% CPU Time  8.37% Wait Time – Advice?  Reduce the number of buffer gets or executions  Wait time – Matters only as a % of total response time

39 www.brianhitchcock.net oraperf.com -- Case 3)  Upload report from fast data load  Analyzer shows – Response time  5.16% CPU Time  94.84% Wait Time – Advice?  Tune PX Deq: Execution Msg event ­But this is an idle event...  Non-idle wait time is only about 25% total time

40 www.brianhitchcock.net oraperf.com -- Case 3)  Conclusion – oraperf.com analyzer provides another tool for performance tuning – Well worth using if only to compute  Response Time  CPU Time  Wait Time ­Check for idle wait events...

41 www.brianhitchcock.net No Excuses  Install STATSPACK  Generate two snapshots  Generate standard report  Upload to oraperf.com  Review advice  Fast, free performance analysis!

42 www.brianhitchcock.net Installing STATSPACK  Create separate tablespace  Create PERFSTAT user  Execute SQL script to create tables  Setup job to execute snapshots  Setup process to purge data over time  Set timed_statistics = TRUE – Not required, but needed to get wait time data

43 www.brianhitchcock.net Installing STATSPACK  As user ‘SYS’ create tablespace perfstat datafile '/xxx/xxx/perfstat_01.dbf' size 500M; cd $ORACLE_HOME/rdbms/admin sqlplus sys @spcreate.sql Enter value for default_tablespace: perfstat Enter value for temporary_tablespace: temp

44 www.brianhitchcock.net Generate Standard Report  Report SQL supplied by Oracle sqlplus perfstat/perfstat@ execute statspack.snap sqlplus perfstat/perfstat@ @$ORACLE_HOME/rdbms/admin/spreport.sql Enter value for begin_snap: 1 Enter value for end_snap: 2 Enter value for report_name: testing

45 www.brianhitchcock.net Select STATSPACK Data  Query the tables directly select to_char(snap_time,'yyyy-mm-dd HH24') mydate, new.name buffer_pool_name, (((new.consistent_gets-old.consistent_gets)+ (new.db_block_gets-old.db_block_gets))-(new.physical_reads-old.physical_reads)) / ((new.consistent_gets-old.consistent_gets)+ (new.db_block_gets-old.db_block_gets)) bhr from perfstat.stats$buffer_pool_statistics old, perfstat.stats$buffer_pool_statistics new, perfstat.stats$snapshot sn where new.snap_id > 13125 and new.snap_id < 13149 and new.name = old.name and new.snap_id = sn.snap_id and old.snap_id = sn.snap_id-1; Based on SQL from ORACLE High-Performance Tuning with STATSPACK Donald K. Burleson Oracle Press ISBN 0-07-213378-3

46 www.brianhitchcock.net Buffer Cache Hit Ratio Case 4)  Output of SQL on previous slide yr. mo dy Hr BUFFER_POOL_NAME BHR ------------- -------------------- ----- 2002-09-17 02 DEFAULT 1.00 2002-09-17 02 DEFAULT.94 2002-09-17 02 DEFAULT.87 2002-09-17 03 DEFAULT.90 2002-09-17 03 DEFAULT.93 2002-09-17 03 DEFAULT.97 2002-09-17 03 DEFAULT.82 2002-09-17 04 DEFAULT.78 2002-09-17 04 DEFAULT.76 2002-09-17 04 DEFAULT.80 2002-09-17 05 DEFAULT.69 2002-09-17 05 DEFAULT.73 2002-09-17 05 DEFAULT.76 2002-09-17 05 DEFAULT.69 2002-09-17 06 DEFAULT.81 2002-09-17 06 DEFAULT 1.00 2002-09-17 07 DEFAULT 1.00

47 www.brianhitchcock.net Space Used  Snapshot size varies with – Number of tablespaces – Number of SQL statements captured  Db1 21 tablespaces --> 0.15 Mb/snapshot  Db2 376 tablespaces --> 0.37 Mb/snapshot  Assuming a snapshot every 15 minutes – 96 snapshots per day – Db1 --> 14.4 Mb/day – Db2 --> 35.6 Mb/day

48 www.brianhitchcock.net Removing Snapshot Data  Oracle supplied SQL – SQL removes snapshot data for a range of snapshot id numbers  Example sqlplus perfstat/perfstat@ @$ORACLE_HOME/rdbms/admin/sppurge...... (listing of all existing snapshots)... Specify the Lo Snap Id and Hi Snap Id range to purge ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Enter value for losnapid: 4001 Using 4001 for lower bound. Enter value for hisnapid: 5000 Using 5000 for upper bound. Deleting snapshots 4001 - 5000. commit; Note: large deletes may fill rollback segments

49 www.brianhitchcock.net Summary  STATSPACK – Free, easy to install, easy to run – Output can be very useful or confusing – Real-world use has resulted in big performance gains – Useful for all instances – Standard way to gather performance data

50 www.brianhitchcock.net Reminder – please complete the OracleWorld session survey Thank you.


Download ppt "How STATSPACK Was Used to Solve Common Performance Issues Brian Hitchcock OCP 8, 8i, 9i DBA Sun Microsystems"

Similar presentations


Ads by Google