Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Oracle Operational Timing Data Paper 36571 Cary Millsap

Similar presentations


Presentation on theme: "1 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Oracle Operational Timing Data Paper 36571 Cary Millsap"— Presentation transcript:

1 www.hotsos.comSlide 1 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Oracle Operational Timing Data Paper 36571 Cary Millsap (cary.millsap@hotsos.com) Hotsos Enterprises, Ltd. OracleWorld 2003 / San Francisco, California USA 8:30a–9:30a Wednesday 10 September 2003

2 www.hotsos.comSlide 2 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Additional resources for you… Paper update –hotsos.com Questions –Hotsos booth in exhibit hall Book –Coming soon (est. July)

3 www.hotsos.comSlide 3 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Motive

4 www.hotsos.comSlide 4 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Three great discoveries are revolutionizing the effectiveness of Oracle performance analysts today. User action focus Response time focus Amdahls Law Recommendation –Eli Goldratts The Goal –Goal readers are now doing the best work of their lives.

5 www.hotsos.comSlide 5 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Great Discovery #1: user action focus. You cant extrapolate detail from an aggregate. Some problems cant be detected by observing system-wide data Hence, system-wide focus is not reliable This is one reason tuning is so hard

6 www.hotsos.comSlide 6 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Example: Whats the problem here? Its obvious, right? CIOs #1 problem –We need PYUGEN to execute 2x faster DBAs viewpoint –The systems number one wait event is latch free

7 www.hotsos.comSlide 7 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Wrong. You can not trust system-wide data to guide your way. Resource profile for PYUGEN Response Time Component Duration # Calls Dur/Call ----------------------------- --------------- ----------- ------------ SQL*Net message from client 984.0s 49.6% 95,161 0.010340s SQL*Net more data from client 418.8s 21.1% 3,345 0.125208s db file sequential read 279.3s 14.1% 45,084 0.006196s CPU service 248.7s 12.5% 222,760 0.001116s unaccounted-for 27.9s 1.4% latch free 23.7s 1.2% 34,695 0.000683s log file sync 1.1s 0.1% 506 0.002154s SQL*Net more data to client 0.8s 0.0% 15,982 0.000052s log file switch completion 0.3s 0.0% 3 0.093333s enqueue 0.3s 0.0% 106 0.002358s SQL*Net message to client 0.2s 0.0% 95,161 0.000003s other 0.2s 0.0% ----------------------------- --------------- ----------- ------------ Total 1,985.4s 100.0%

8 www.hotsos.comSlide 8 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Great Discovery #2: response time focus. You cant tell how long something took by counting how many times it happened. Try it…

9 www.hotsos.comSlide 9 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Example: Whats the problem here? Its obvious, right? Response Time Component # Calls ------------------------------ --------- CPU service 18,750 SQL*Net message to client 6,094 SQL*Net message from client 6,094 db file sequential read 1,740 log file sync 681 SQL*Net more data to client 108 SQL*Net more data from client 71 db file scattered read 34 direct path read 5 free buffer waits 4 log buffer space 2 direct path write 2 log file switch completion 1 latch free 1

10 www.hotsos.comSlide 10 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Wrong. You can not trust call counts to guide your way. Response Time Component Duration # Calls Dur/Call ------------------------------ ------------------ --------- ----------- SQL*Net message from client 166.60s 91.7% 6,094 0.027338s CPU service 9.65s 5.3% 18,750 0.000515s unaccounted-for 2.22s 1.2% db file sequential read 1.59s 0.9% 1,740 0.000914s log file sync 1.12s 0.6% 681 0.001645s SQL*Net more data from client 0.25s 0.1% 71 0.003521s SQL*Net more data to client 0.11s 0.1% 108 0.001019s free buffer waits 0.09s 0.0% 4 0.022500s SQL*Net message to client 0.04s 0.0% 6,094 0.000007s db file scattered read 0.04s 0.0% 34 0.001176s log file switch completion 0.03s 0.0% 1 0.030000s log buffer space 0.01s 0.0% 2 0.005000s latch free 0.01s 0.0% 1 0.010000s direct path read 0.00s 0.0% 5 0.000000s direct path write 0.00s 0.0% 2 0.000000s ------------------------------ ------------------ --------- ----------- Total 181.76s 100.0%

11 www.hotsos.comSlide 11 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Great Discovery #3: Amdahls Law. Performance improvement is proportional to how much a program uses the thing you improved. Gene Amdahls realization, formalized in 1967 –Many tuning efforts are misplaced –This is one reason so many successful performance improvement projects create no impact

12 www.hotsos.comSlide 12 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. These great discoveries lead us to a simple, obviously efficient Oracle optimization strategy… Work first to reduce the biggest response time component of a business most important user action. Sounds easy So why dont people do it? Because its hard to get response time data for a specific user action

13 www.hotsos.comSlide 13 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Oracle Operational Timing Data

14 www.hotsos.comSlide 14 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. So why do Oracle analysts use counts instead of timings? Because counts are easier to obtain Counts are easy –Select from V$SESSTAT union V$SESSION_EVENT at t 0 –Select again at t 1 –Then compute results[t 1 ] – results[t 0 ] –Maximum error is ±1 event (not a big deal) But times are hard –Same method produces potentially enormous duration errors –Example…

15 www.hotsos.comSlide 15 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Accounting for response time with SQL queries of Oracle fixed view data is hard. $ perl vprof.pl --username=system --password=manager 8 Press to mark time t0: Press to mark time t1: Resource Profile for Session 8 t0 = 15:31:50.507000 t1 = 15:32:02.696000 interval duration = 12.189000s accounted-for duration = 23.510000s Response Time Component Duration (seconds) Calls Dur/Call ---------------------------------- -------------------- ------------ ---------- SQL*Net message from client 22.78 186.9% 2 11.390000 db file sequential read 0.60 4.9% 199 0.003015 CPU service 0.09 0.7% 2 0.045000 db file scattered read 0.04 0.3% 4 0.010000 SQL*Net message to client 0.00 0.0% 2 0.000000 unaccounted-for -11.32 -92.9% 1 -11.321000 ---------------------------------- -------------------- ------------ ---------- Total 12.19 100.0% Source: Millsap, C. 2003. Optimizing Oracle Performance, Sebastopol CA: OReilly (July 2003)

16 www.hotsos.comSlide 16 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. One easy way out of the problem is to use Oracles extended SQL trace data. Easy to activate Low overhead if used sensibly Produces a chronological record Contains all the performance data you need in most cases Doesnt require any special tools to get started

17 www.hotsos.comSlide 17 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. The easiest way to activate extended SQL trace is to add a few statements to your applications source code. alter session set timed_statistics=true alter session set max_dump_file_size=unlimited alter session set tracefile_identifier='POX20031031a' alter session set events '10046 trace name context forever, level 8' /* code to be traced goes here */ alter session set events '10046 trace name context off'

18 www.hotsos.comSlide 18 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Raw trace data lines are not so hard to understand. PARSE #54:c=20000,e=11526,p=0,cr=2,cu=0,mis=1,r=0,dep=1,og=0,tim=101703 9304725071 EXEC #1:c=10000,e=12137,p=0,cr=22,cu=0,mis=0,r=1,dep=0,og=4,tim=1017039 275981174 FETCH #3:c=10000,e=306,p=0,cr=3,cu=0,mis=0,r=1,dep=2,og=4,tim=101703927 5973158 WAIT #1: nam='SQL*Net message to client' ela= 40 p1=1650815232 p2=1 p3= 0 WAIT #1: nam='SQL*Net message from client' ela= 1709 p1=1650815232 p2=1 p3=0 WAIT #34: nam='db file sequential read' ela= 14118 p1=52 p2=2755 p3=1 WAIT #44: nam='latch free' ela= 1327989 p1=-1721538020 p2=87 p3=13

19 www.hotsos.comSlide 19 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. The five fields Ill speak about today are the cursor id, c, e, nam, and ela (see MetaLink 39817.1 for more info). FieldDescription #n#n n is the id of the cursor upon which the action is executing c The approximate total CPU capacity (user-mode + kernel-mode) consumed by the database call e The approximate wall time that elapsed during the database call nam The name assigned by an Oracle kernel developer to a sequence of instructions (often including a system call) in the Oracle kernel ela The approximate wall time that elapsed during the wait event

20 www.hotsos.comSlide 20 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. There are two types of wait event: events issued within db calls, and calls issued between db calls. Events issued between db calls are recognizable by nam –SQL*Net message from client –SQL*Net message to client –single-task message –pipe get –rdbms ipc message –pmon timer –smon timer Most other events are issued within db calls

21 www.hotsos.comSlide 21 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. A db calls elapsed time is approximately its total CPU consumption plus the total duration of its wait events. WAIT #4: nam='db file sequential read' ela= 13060 p1=1 p2=53903 p3=1 WAIT #4: nam='db file sequential read' ela= 6978 p1=1 p2=4726 p3=1 FETCH #4:c=0,e=21340,p=2,cr=3,cu=0,mis=0,r=0,dep=1,og=4,tim=10330641379 53092

22 www.hotsos.comSlide 22 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. A whole trace files elapsed duration is the sum of the db call durations plus the sum of the between-call stuff. PARSE #9:c=0,e=0,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=4,tim=1716466757 WAIT #9: nam='SQL*Net message to client' ela= 0 p1=1413697536 p2=1 p3=0 WAIT #9: nam='SQL*Net message from client' ela= 3 p1=1413697536 p2=1 p3 =0... PARSE #9:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=4,tim=1716466760

23 www.hotsos.comSlide 23 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Extended SQL trace is reliable for diagnosing the root cause of virtually any performance problem. Method is reliable for even more problem types by isolating the un- measured components –M – measurement intrusion effect –E – quantization error –N – time spent not executing –U – un-instrumented Oracle code (systematic error + bugs) –S – CPU consumption that is counted twice

24 www.hotsos.comSlide 24 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. What you can do with the timing data is spectacular. Performance problems cannot hide from you No more solving the wrong problem No more stabs in the dark or trial-and-error tuning No more multi-month performance diagnosis projects No more CTD [Vaidyanatha and Deshpande (2000)] You either solve the right problem quickly or prove that solving it is not worth the effort.

25 www.hotsos.comSlide 25 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Performance problems cannot hide from this method. Partial laundry list of problems weve solved –System-wide problems, and specific user-action problems –Query mistakes: bad SQL, faulty indexing, data density, … –Application mistakes: excessive parsing, no arrays, … –Serialization mistakes: locks, latches, memory buffers, … –Network mistakes: protocol selection, bad segments, … –Disk I/O mistakes: poorly sized caches, imbalances, … –Capacity shortages: swapping, paging, …

26 www.hotsos.comSlide 26 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Why I use extended SQL trace data instead of Oracles V$ fixed views… I dont use V$ data anymore, because its a mess –Cant poll fast enough with SQL –Too complicated to attach directly to SGA –Too many data sources ( V$SESSTAT, V$SESSION_EVENT, V$LATCH, V$LOCK, V$FILESTAT, V$WAITSTAT, V$SQL, …) –No notion of e ; therefore, cant isolate M, E, N, U, S –No read consistency –Statistics are unreliable ( CPU time used by this session, SECONDS_IN_WAIT, …) –Statistics are susceptible to overflow –Hard (impossible?) to determine recursive SQL relationships

27 www.hotsos.comSlide 27 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Where do Oracles timing statistics actually come from? procedure dbcall { e0 = gettimeofday; # mark the wall time c0 = getrusage; # obtain resource usage statistics... # execute the db call (may call wevent) c1 = getrusage; # obtain resource usage statistics e1 = gettimeofday; # mark the wall time e = e1 – e0; # elapsed duration of dbcall c = (c1.utime + c1.stime) – (c0.utime + c0.stime); # total CPU time consumed by dbcall } procedure wevent { ela0 = gettimeofday; # mark the wall time... # execute the wait event (syscall) ela1 = gettimeofday; # mark the wall time ela = ela1 – ela0; # elapsed duration of wevent }

28 www.hotsos.comSlide 28 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Understanding where Oracle timings come from motivates all sorts of interesting research projects… How to quantify measurement intrusion effect How to bound quantization error How to bound un-instrumented call effects How to estimate the amount of kernel-mode time double-counting …

29 www.hotsos.comSlide 29 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Questions & Answers

30 www.hotsos.comSlide 30 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Oracle Operational Timing Data Paper 36571 Cary Millsap (cary.millsap@hotsos.com) Hotsos Enterprises, Ltd. OracleWorld 2003 / San Francisco, California USA 8:30a–9:30a Wednesday 10 September 2003


Download ppt "1 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Oracle Operational Timing Data Paper 36571 Cary Millsap"

Similar presentations


Ads by Google