Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Oracle Operational Timing Data Paper 36571 Cary Millsap

Similar presentations


Presentation on theme: "1 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Oracle Operational Timing Data Paper 36571 Cary Millsap"— Presentation transcript:

1 1 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Oracle Operational Timing Data Paper Cary Millsap Hotsos Enterprises, Ltd. OracleWorld 2003 / San Francisco, California USA 8:30a–9:30a Wednesday 10 September 2003

2 2 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Additional resources for you… Paper update –hotsos.com Questions –Hotsos booth in exhibit hall Book –Coming soon (est. July)

3 3 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Motive

4 4 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Three great discoveries are revolutionizing the effectiveness of Oracle performance analysts today. User action focus Response time focus Amdahls Law Recommendation –Eli Goldratts The Goal –Goal readers are now doing the best work of their lives.

5 5 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Great Discovery #1: user action focus. You cant extrapolate detail from an aggregate. Some problems cant be detected by observing system-wide data Hence, system-wide focus is not reliable This is one reason tuning is so hard

6 6 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Example: Whats the problem here? Its obvious, right? CIOs #1 problem –We need PYUGEN to execute 2x faster DBAs viewpoint –The systems number one wait event is latch free

7 7 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Wrong. You can not trust system-wide data to guide your way. Resource profile for PYUGEN Response Time Component Duration # Calls Dur/Call SQL*Net message from client 984.0s 49.6% 95, s SQL*Net more data from client 418.8s 21.1% 3, s db file sequential read 279.3s 14.1% 45, s CPU service 248.7s 12.5% 222, s unaccounted-for 27.9s 1.4% latch free 23.7s 1.2% 34, s log file sync 1.1s 0.1% s SQL*Net more data to client 0.8s 0.0% 15, s log file switch completion 0.3s 0.0% s enqueue 0.3s 0.0% s SQL*Net message to client 0.2s 0.0% 95, s other 0.2s 0.0% Total 1,985.4s 100.0%

8 8 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Great Discovery #2: response time focus. You cant tell how long something took by counting how many times it happened. Try it…

9 9 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Example: Whats the problem here? Its obvious, right? Response Time Component # Calls CPU service 18,750 SQL*Net message to client 6,094 SQL*Net message from client 6,094 db file sequential read 1,740 log file sync 681 SQL*Net more data to client 108 SQL*Net more data from client 71 db file scattered read 34 direct path read 5 free buffer waits 4 log buffer space 2 direct path write 2 log file switch completion 1 latch free 1

10 10 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Wrong. You can not trust call counts to guide your way. Response Time Component Duration # Calls Dur/Call SQL*Net message from client s 91.7% 6, s CPU service 9.65s 5.3% 18, s unaccounted-for 2.22s 1.2% db file sequential read 1.59s 0.9% 1, s log file sync 1.12s 0.6% s SQL*Net more data from client 0.25s 0.1% s SQL*Net more data to client 0.11s 0.1% s free buffer waits 0.09s 0.0% s SQL*Net message to client 0.04s 0.0% 6, s db file scattered read 0.04s 0.0% s log file switch completion 0.03s 0.0% s log buffer space 0.01s 0.0% s latch free 0.01s 0.0% s direct path read 0.00s 0.0% s direct path write 0.00s 0.0% s Total s 100.0%

11 11 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Great Discovery #3: Amdahls Law. Performance improvement is proportional to how much a program uses the thing you improved. Gene Amdahls realization, formalized in 1967 –Many tuning efforts are misplaced –This is one reason so many successful performance improvement projects create no impact

12 12 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. These great discoveries lead us to a simple, obviously efficient Oracle optimization strategy… Work first to reduce the biggest response time component of a business most important user action. Sounds easy So why dont people do it? Because its hard to get response time data for a specific user action

13 13 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Oracle Operational Timing Data

14 14 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. So why do Oracle analysts use counts instead of timings? Because counts are easier to obtain Counts are easy –Select from V$SESSTAT union V$SESSION_EVENT at t 0 –Select again at t 1 –Then compute results[t 1 ] – results[t 0 ] –Maximum error is ±1 event (not a big deal) But times are hard –Same method produces potentially enormous duration errors –Example…

15 15 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Accounting for response time with SQL queries of Oracle fixed view data is hard. $ perl vprof.pl --username=system --password=manager 8 Press to mark time t0: Press to mark time t1: Resource Profile for Session 8 t0 = 15:31: t1 = 15:32: interval duration = s accounted-for duration = s Response Time Component Duration (seconds) Calls Dur/Call SQL*Net message from client % db file sequential read % CPU service % db file scattered read % SQL*Net message to client % unaccounted-for % Total % Source: Millsap, C Optimizing Oracle Performance, Sebastopol CA: OReilly (July 2003)

16 16 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. One easy way out of the problem is to use Oracles extended SQL trace data. Easy to activate Low overhead if used sensibly Produces a chronological record Contains all the performance data you need in most cases Doesnt require any special tools to get started

17 17 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. The easiest way to activate extended SQL trace is to add a few statements to your applications source code. alter session set timed_statistics=true alter session set max_dump_file_size=unlimited alter session set tracefile_identifier='POX a' alter session set events '10046 trace name context forever, level 8' /* code to be traced goes here */ alter session set events '10046 trace name context off'

18 18 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Raw trace data lines are not so hard to understand. PARSE #54:c=20000,e=11526,p=0,cr=2,cu=0,mis=1,r=0,dep=1,og=0,tim= EXEC #1:c=10000,e=12137,p=0,cr=22,cu=0,mis=0,r=1,dep=0,og=4,tim= FETCH #3:c=10000,e=306,p=0,cr=3,cu=0,mis=0,r=1,dep=2,og=4,tim= WAIT #1: nam='SQL*Net message to client' ela= 40 p1= p2=1 p3= 0 WAIT #1: nam='SQL*Net message from client' ela= 1709 p1= p2=1 p3=0 WAIT #34: nam='db file sequential read' ela= p1=52 p2=2755 p3=1 WAIT #44: nam='latch free' ela= p1= p2=87 p3=13

19 19 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. The five fields Ill speak about today are the cursor id, c, e, nam, and ela (see MetaLink for more info). FieldDescription #n#n n is the id of the cursor upon which the action is executing c The approximate total CPU capacity (user-mode + kernel-mode) consumed by the database call e The approximate wall time that elapsed during the database call nam The name assigned by an Oracle kernel developer to a sequence of instructions (often including a system call) in the Oracle kernel ela The approximate wall time that elapsed during the wait event

20 20 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. There are two types of wait event: events issued within db calls, and calls issued between db calls. Events issued between db calls are recognizable by nam –SQL*Net message from client –SQL*Net message to client –single-task message –pipe get –rdbms ipc message –pmon timer –smon timer Most other events are issued within db calls

21 21 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. A db calls elapsed time is approximately its total CPU consumption plus the total duration of its wait events. WAIT #4: nam='db file sequential read' ela= p1=1 p2=53903 p3=1 WAIT #4: nam='db file sequential read' ela= 6978 p1=1 p2=4726 p3=1 FETCH #4:c=0,e=21340,p=2,cr=3,cu=0,mis=0,r=0,dep=1,og=4,tim=

22 22 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. A whole trace files elapsed duration is the sum of the db call durations plus the sum of the between-call stuff. PARSE #9:c=0,e=0,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=4,tim= WAIT #9: nam='SQL*Net message to client' ela= 0 p1= p2=1 p3=0 WAIT #9: nam='SQL*Net message from client' ela= 3 p1= p2=1 p3 =0... PARSE #9:c=0,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=4,tim=

23 23 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Extended SQL trace is reliable for diagnosing the root cause of virtually any performance problem. Method is reliable for even more problem types by isolating the un- measured components –M – measurement intrusion effect –E – quantization error –N – time spent not executing –U – un-instrumented Oracle code (systematic error + bugs) –S – CPU consumption that is counted twice

24 24 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. What you can do with the timing data is spectacular. Performance problems cannot hide from you No more solving the wrong problem No more stabs in the dark or trial-and-error tuning No more multi-month performance diagnosis projects No more CTD [Vaidyanatha and Deshpande (2000)] You either solve the right problem quickly or prove that solving it is not worth the effort.

25 25 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Performance problems cannot hide from this method. Partial laundry list of problems weve solved –System-wide problems, and specific user-action problems –Query mistakes: bad SQL, faulty indexing, data density, … –Application mistakes: excessive parsing, no arrays, … –Serialization mistakes: locks, latches, memory buffers, … –Network mistakes: protocol selection, bad segments, … –Disk I/O mistakes: poorly sized caches, imbalances, … –Capacity shortages: swapping, paging, …

26 26 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Why I use extended SQL trace data instead of Oracles V$ fixed views… I dont use V$ data anymore, because its a mess –Cant poll fast enough with SQL –Too complicated to attach directly to SGA –Too many data sources ( V$SESSTAT, V$SESSION_EVENT, V$LATCH, V$LOCK, V$FILESTAT, V$WAITSTAT, V$SQL, …) –No notion of e ; therefore, cant isolate M, E, N, U, S –No read consistency –Statistics are unreliable ( CPU time used by this session, SECONDS_IN_WAIT, …) –Statistics are susceptible to overflow –Hard (impossible?) to determine recursive SQL relationships

27 27 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Where do Oracles timing statistics actually come from? procedure dbcall { e0 = gettimeofday; # mark the wall time c0 = getrusage; # obtain resource usage statistics... # execute the db call (may call wevent) c1 = getrusage; # obtain resource usage statistics e1 = gettimeofday; # mark the wall time e = e1 – e0; # elapsed duration of dbcall c = (c1.utime + c1.stime) – (c0.utime + c0.stime); # total CPU time consumed by dbcall } procedure wevent { ela0 = gettimeofday; # mark the wall time... # execute the wait event (syscall) ela1 = gettimeofday; # mark the wall time ela = ela1 – ela0; # elapsed duration of wevent }

28 28 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Understanding where Oracle timings come from motivates all sorts of interesting research projects… How to quantify measurement intrusion effect How to bound quantization error How to bound un-instrumented call effects How to estimate the amount of kernel-mode time double-counting …

29 29 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Questions & Answers

30 30 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Oracle Operational Timing Data Paper Cary Millsap Hotsos Enterprises, Ltd. OracleWorld 2003 / San Francisco, California USA 8:30a–9:30a Wednesday 10 September 2003


Download ppt "1 Copyright © 1999–2003 by Hotsos Enterprises, Ltd. Oracle Operational Timing Data Paper 36571 Cary Millsap"

Similar presentations


Ads by Google