Presentation is loading. Please wait.

Presentation is loading. Please wait.

Inside SQL Server Wait Types

Similar presentations


Presentation on theme: "Inside SQL Server Wait Types"— Presentation transcript:

1 Inside SQL Server Wait Types
SQL 2005 and SQL 2008 Bob Ward Microsoft Corporation

2 Pre-Conference Seminar
Microsoft CSS at PASS 2009 Pre-Conference Seminar Tackling Top Reporting Services Issues Mon 11/2 8:30am-4:30pm Adam Saxton Main Conference Talks (DBA-500-SC) Inside SQL Server Wait Types Tues 11/3 10:15 – 11:45am 3AB Bob Ward (DBA-X69-C) Implementing and Supporting SQL 2008 Failover Clustering Tues 11/3 1:30-2:45pm 4C1-2 Shon Hauck (BIA-X45-C) Top customer support issues in Analysis Services Wed 11/4 1:30-2:45pm 2AB John Sirmon (AD-X43-C ) Troubleshooting applications accessing SQL Server Thurs 11/5 1:00-2:15pm Abirami Iyer and Lakshmi Jonnakuti SQL Server Clinic Room 611 11/3 – 11/5/2009 After Keynote – 6:00pm…ish 1 minute

3 = Welcome to My World Stay for questions as long as you want
This is a “500” level talk I assume SQL knowledge DMVs, Code, Debugger, and APIs We will move fast and furious This means your brain may hurt Stuff you can use Stay for questions as long as you want 2 minutes All scripts will be available

4 What is a wait type? We created this to help us find bottlenecks
Not the best of docs We created this to help us find bottlenecks In a galaxy, far, far, away we had locks, I/O and network But as time has moved on… we went a bit overboard The name of the type is up to the developer 485 in SQL Server 2008 I/O, Network, Thread, Memory Resource Locks, Latches, and “bunch of others” Synchronization Yield or Sleep Forced Preemption External Background tasks Queue 3 minutes That 485 doesn’t include all the different type of non-BUF latch classes of which there are 145 There were only 202 in SQL Server 2005 (but 190 of the new one are for a new concept called PREEMPTIVE waits) You will be happy to know that in SQL Server 2008 R2 we only have 490 wait types and 144 latch classes STAY TO THE END OF THE TALK!: I have a plan to promote better documentation for these wait types and will discuss this at the end of the talk

5 How does a wait type work?
Developer writes code that “runs” Developer knows they might execute code that “waits” Developer “sets” a wait type Developer calls SQLOS routines to “wait” Code saves last wait type Anyone querying the DMVS sees the wait type and accumulated wait time Code is signaled to “wake-up” Code clears wait type, time, and last wait type 2 minutes

6 Let’s look at an example
We know we need to wait Common for a SELECT Request LCK_M_IS (Shared Intent) lock A conflict exists Setup a SOS_WaitInfo with LCK_M_IS Call LockOwner::Sleep Use SOS_EventAuto class to wait Understands SQLOS scheduling 5 mins – debug folder Demo instructions: Show mavs.sql Open delete_josh.sql to delete Josh from the starting lineup. Run it Open insert_sean.sql to insert Sean Marion. Run it. It blocks Break into the debugger and find the blocking thread by searching for LockOwner::Sleep Wait() results in SignalObjectAndWait() Ultimately it always comes down to WaitForSingleObject() or SignalObjectAndWait() SOS_EventAuto is a wrapper for Windows Kernel Event object

7 Where do wait types show up?
sys.dm_os_wait_stats sys.dm_exec_requests sys.dm_os_waiting_tasks sys.sysprocesses Extended Events Management Data Warehouse Activity Monitor Performance Monitor Counters Historical stats Live state legacy Tracing in 2008 10 mins sys.dm_os_wait_stats signal_wait_time_ms will be discussed later in the talk sys.dm_exec_requests TASK_MANAGER requests are a “pool” and don’t operate like other “background” tasks (such as LazyWriter) blocking_session_id only valid for locks and latches last_wait_type is the true previous wait_type when a new wait occurs, but…. It is cleared when the task no longer is waiting (as is wait_type). “RESOURCE MONITOR” task is the exception to this rule. It will show a PREEMPTIVE lastwaitype even when wait_type is NULL. SOS_SCHEDULER_YIELD can look like this as well if tasks are switching fairly quickly. sys.dm_os_waiting_tasks Note that some background tasks cycle wait times (sleep and wakeup) vs some just accumulate “forever”. The ones that accumulate forever get “signaled” to wake-up vs sleeping for a while and then waking up on an interval. Note CHECKPOINT_QUEUE seems “infinite”. This because unlike previous versions, we now get signaled by other threads to checkpoint or truncate a log. LogWriter signals Checkpoint in 2005 and 2008 to truncate the log for databases (such as tempdb). sys.sysprocesses wait_type hex code mapping has changed. See appendix slide for more information. One possible reason to use this is that parallel workers will show up here And….also shows sessions and requests together Extended Events We will talk later but you can “trace” waits now using this feature in SQL Server 2008 Management Data Warehouse I have an appendix slide talking about waits but show quickly how to look at the report and talk about what we capture to produce it Uses sys.dm_os_wait_stats and builds a category of waits in the MDW database. Activity Monitor Use sys.dm_os_wait_stats and sys.dm_os_waiting_tasks Use Profiler to see how we categorize the wait_types A bunch are “Other” Notice how we don’t show background waiting (Alert: Problem here with CLR and FTS where high waits show up here for background tasks) In the tools Wait Statistics Counter

8 Dive into Wait Types

9 Common Wait Types PAGELATCH and PAGEIOLATCH LCK_XX ASYNC_NETWORK_IO
BUF latch - sync Common Wait Types Hint: System table or allocation Locks - sync Hint: Your app LCK_XX PAGELATCH and PAGEIOLATCH ASYNC_NETWORK_IO Hint: I/O delay Andrew Kelly’s talk on Capturing and Analyzing File & Wait Stats Resource Make up ~50 of the wait types 3 mins 21 for locks 24 for latches Hint: Network or your app

10 Some Waits may not be bottlenecks
MISCELLANEOUS Background Task Waits LAZYWRITER_SLEEP SQLTRACE_BUFFER_FLUSH CHECKPOINT_QUEUE REQUEST_FOR_DEADLOCK_SEARCH CLR_AUTO_EVENT Should be called “not waiting” BOL calls these Queue Waits 3 mins MISCELLANEOUS is “default” wait type but not used to actually wait on anything. You should not see the waiting_tasks_count > 0 for this in SQL Server 2005 and 2008 (it might have been used in SQL2K). So why does it show up twice? Because we originally had a wait type we were using in another project. It is not even compiled into the code anymore but we left around the “entry” and marked it MISCELLANEOUS. LAZYWRITER_SLEEP is the “LazyWriter” SQLTRACE_BUFFER_SLEEP is a background task to flush SQLTrace buffers to files (remember one is on by default now) LOGMGR_QUEUE is the Log Writer REQUEST_FOR_DEADLOCK_SEARCH is the “Lock Monitor” Background task waits can be found by running this query select er.session_id, er.command, er.wait_type, er.last_wait_type, er.wait_time, er.wait_resource from sys.dm_exec_requests er join sys.dm_exec_sessions es on es.session_id = er.session_id and es.is_user_process = 0 CLR_AUTO_EVENT is a normal wait used for SQL Server to host CLR. So this will show up as normal waits when you use SQLCLR assemblies. One exception is described in the following blog post Normal for SQL CLR

11 Busting the Myth of CXPACKET
Sync Craig Freedman Talk is a must read Used to synchronize parallel query workers Just means you have a parallel query Do you expect parallel queries? Do you have high wait times? wait_resource shows coordination High wait times mean long running parallel queries Look at the Tasks Which one is not CXPACKET? Some other wait may be the issue What Should I Do? You may not need to do anything Find queries and tune them Use the MAXDOP query hint Modify ‘max degree of parallelism’ 8 mins DEMO: Load up repro.sql in one query window Load up dmvs.sql in another query window Show the DMV query output and show the wait_resource. Talk about how waiting_tasks could have more than one wait for a given task because of the way consumers and producers work. Show the nodeID as well in the resource. Show the estimated plan and show the NodeID in the plan. Notes on wait_resource for CXPACKET waits e_waitPipeNewRow – Producer waiting on consumer e_waitPipeGetRow – Consumer waiting on producer Note that sys.dm_os_waiting_tasks may show more than 1 row for a given task in these cases because consumers can wait on more than one produce and one producer may be feeding more than one consumer. NodeID is the operator in the Plan. Show an estimated plan and the NodeId from it. You might see some tasks “get started early” before another operator is complete. If so this may look like e_waitPortOpen. Don’t jump to these

12 Same modes as BUFs (KP, SH, UP, EX, DT)
The non BUFFER Latch Sync Not just for BUFs Thread sync of memory structure Latch can be generic As opposed to PAGELATCH or PAGEIOLATCH Appears as LATCH_XX sys.dm_os_latch_stats sys.dm_exec_requests.wait_resource How many are there? Same modes as BUFs (KP, SH, UP, EX, DT) 2 mins 138 latch classes in SQL Server 2005 145 in SQL Server 2008 Latch class

13 FGCB_ADD_REMOVE latch
Sync SQL Server Engine INSERT “I need to grow” LATCH_EX: FGCB_ADD_REMOVE Need space INSERT mydb.mdf Need space FGCB Autogrow INSERT Need space 5 mins We will see this problem in action later in the talk in combination with PREEMPTIVE waits Moral of the story: Use instant file initialization but…it doesn’t work for the tlog Need space INSERT LATCH_SH: FGCB_ADD_REMOVE

14 SOS_SCHEDULER_YIELD A task that does not “naturally” wait must yield
Forced I/O, Lock, Latch A task that does not “naturally” wait must yield What if we don’t do this right? Examples No I/O needed for pages T-SQL variables only or just “expressions” Query compile Small hashes and sorts Indicators High count  CPU intensive query High wait time  CPU intensive queries competing or someone not yielding very well ************************ * * BEGIN STACK DUMP: * 10/17/09 15:51:52 spid 0 * Non-yielding Scheduler 5 mins Demo: Run sqlcpudemo.sql in 1 query window Run wait_stats.sql in another window Launch Task Manager Show high wait count but low wait times from sys.dm_os_wait_stats Add another one and now watch the wait time go up. Do it again Look at CPU in Task Manager Talk about 2 CPU cases where this wait_type may not show up as large: Something that doesn’t yield Pre-emptive worker that chews up CPU (but this may show up with PREEMPTIVE wait type) Could be preemptive thread(s)

15 THREADPOOL Resource Applies to any task TDS Login Login Timeout
Receive TDS packet Engine creates SQLOS Task Find available worker on scheduler If none, set THREADPOOL wait type Next available worker runs task TDS Login Login Timeout These are pure victims Look for other waits Often a long blocking chain DO NOT assume you need more worker threads Only seen in stats and tasks You may need DAC to see it live PENDING tasks and work_queue_count in schedulers > 0 5 mins You can’t see this in sys.dm_exec_requests because a request means your task is bound to a worker. Want to demo this yourself. Fix ‘max worker threads’ to 255. Now create a blocking problem where > 255 connections are blocked. Now try to connect and you will be hung and eventually timeout. Use DAC to connect and look at sys.dm_os_tasks and sys.dm_os_waiting_tasks Request = task + worker

16 What about I/O Waits? Sync Log Writer COMMIT TRAN Flush Log Buffer
WRITELOG Log Cache Mylog.ldf Log Buffer INSERT LOGBUFFER Request Log Buffer All buffers in use Copy model Resource SQLTrace File Mylog.ldf and .mdf IO_COMPLETION Resource Page I/O Sort I/O File DISKIO_SUSPEND 5 mins Other “Log” Wait Types Explained: LOGMGR_QUEUE – The normal wait of log writer for work to do LOGMGR – When shutting down a database, this is used to wait for the Log Writes to complete. LOGMGR_RESERVE_APPEND – If a log grow fails, we try to truncate, then wait 1 second before growing again. This 1 second “SLEEP” uses this wait type. LOGMGR_FLUSH – This is only used in a special debug builds internally at Microsoft. LOGGENERATION – Only set and used in internal testing builds at Microsoft VDI App Create database files Engine Workers ASYNC_IO_COMPLETION BACKUP WITH SNAPSHOT Zero Log Files Backup media Resource Sync

17 Queries, Memory, and RESOURCE semaphores
Hashes and sorts Limited memory or too many concurrent users MEMORYCLERK_SQLQUERYEXEC and MEMORYCLERK_SQLQERESERVATIONS clerks dm_exec_query_resource_semaphores dm_exec_query_memory_grants RESOURCE_SEMAPHORE_SMALL_QUERY waits RESOURCE_SEMAPHORE (Query Memory) Why are you compiling so much? Factor of limited memory or “memory hungry” compiles Throttled on a system of levels (gateways) with thresholds High Query Memory lowers thresholds Not often seen on 64bit systems RESOURCE_SEMAPHORE_ QUERY_COMPILE compiles sys.dm_os_memory_brokers DBCC MEMORYSTATUS 10 mins If you get RESOURCE_SEMPAHORE waits you might also see RESOURCE_SEMAPHORE_MUTEX waits with small wait times. This is a synchronization wait used to protect memory structures needed by all “resource semaphore” users to help with their resource access (i.e. resource semaphore statistics). sys.dm_os_memory_brokers gives you information about how procedure cache, compile, and query memory interact. DBCC MEMORYSTATUS has important information for the compile gateways. Demo: Load up repro.sql and show its plan Load it up in 3 other windows Load up dmvs.sql and advanced_dmvs.sql Run the queries Run the DMVs to show the wait and look at the other DMVs Run the brokers.sql script to look at the brokers and DBCC MEMORYSTATUS

18 Pre-emptive Waits Windows API Xproc Status = RUNNING Wait_type = NULL
External May wrap more code than just the API ************************ * * BEGIN STACK DUMP: * 10/17/09 15:51:52 spid 0 * Non-yielding Scheduler Windows API Xproc Workers go pre-emptive when calling “external” APIs that may take “some time” Status = RUNNING Wait_type = NULL What does this look like pre-SQL 2008? Wait_type = PREEMPTIVE_XXXX What does this look like in SQL 2008? 3 mins

19 What are some I might see?
Type Description Scenario PREEMPTIVE_OS_GETPROCADDRESS  Wraps calls to GetProcAddress() and xproc function Measure of xproc execution time PREEMPTIVE_OS_WRITEFILEGATHER Wraps calls to WriteFileGather() to zero out a section of a file Long autogrow for tlog file or database files (if not using instant file init) PREEMPTIVE_OS_LOOKUPACCOUNTSID Wrapped calls to LookupAccountSid() Mostly used during login authentication. Long waits could indicate DC issues. PREEMPTIVE_OLEDBOPS Wrapped around various code fragments that will call OLE-DB methods for linked server queries.. Helps fill in gaps where OLEDB wait not set. ~190 of these 10 mins You can see some “categories” in the name such as OS, COM, OLEDB, DTC, etc PREEMPTIVE_XX_<API> Demo: Load dmvs.sql into SSMS Run .\repro.ps1 Run dmvs queries Point out that lead blocker in SQL 2005 would have been a wait_type of NULL Point out latch stats information

20 Extended Events and Waits
wait_info “Normal” waits wait_info_external Pre-emptive waits wait_type opcode Timings dm_xe_map_values Begin and End Duration, Total, Max 5 mins DEMO: Check the system_health session to see what waits were already tracked? Get query, session, or stack dump On by Default System_Health Session Has These SQLCAT Waits Stats Per Session Project

21 There are other “waits”
“Why can’t I truncate the log” PRECONNECT status A poorly written DLL log_reuse_wait “loader lock wait” sessions Spinlocks Resource Governor 3 mins log_reuse_wait – in sys.databases. Contains string indicating why log could not be truncated (Ex. Active transaction) “loader lock wait” – Advanced scenario but nasty situation where a DLL is holding up other threads because of lengthy code in dllmain(). sys.dm_os_threads has indicator for this Sessions – Status = PRECONNECT means we are stalled in a logon trigger or RG classifier function Spinlocks – very lightweight sync mechanism that is at the lowest level. Usually high CPU. Look at sys.dm_os_spinlock_stats. Backoffs mean we call Windows Sleep() directly because of heavy collisions. Spinlocks uses Windows Sleep and doesn’t mark a wait type (probably because of some type of issue that we need a spinlock to mark the wait type) Resource Governor – Your own injection into waiting. Look at sys.dm_os_resource_governor_XXXXXXX DMVs backoffs in sys.dm_os_spinlock_stats You decide to throttle

22 The Wait Type Repository Blog
Where is THE LIST? and in sys.dm_xe_map_values for SQL Server 2008 In a header file in the source code The BOL list KB article on waittypes is only for SQL 2000 and prior Post new findings on this blog post Comment on the blog or send to Use the blog to update the BOL Blog may contain scenarios and more details The plan The Wait Type Repository Blog 3 mins There are some cases where the BOL needs to be changed and more detailed added. Ex. DBMIRROR_DBM_EVENT says in docs future compatibility not guaranteed when we should just doc what it means (which is a wait on a log flush for the secondary log to be hardened) On the blog we will do things like list out wait_types that should not even be in the DMVs (and file DCRs to remove them from the wait_stats DMV). We will list out ones that are likely not bottlenecks like QUEUE waits for background tasks.

23 Resources Our CSS Escalation Blog The Wait Type Repository Blog Post
BOL reference on sys.dm_os_wait_stats SQLCAT Waits Stats Per Session CodePlex Craig Freedman blog posts on Parallelism CLR Wait Types blog post SQL Server 2005 Waits and Queues Whitepaper The System_Health XEvent Session Blog

24 Appendix

25 What does MDW tell you about I/O Waits
sync reads, sorts, SQLTrace I/O, load CLR assembly Buffer Pool I/O for pages Backups, Recovery, DBM WRITELOG wait time = Log Flush Wait (perfmon) LOGBUFFER is just waiting on folks waiting on WRITELOG

26 The mapping has changed
sys.dm_xe_map_values has the correct mapping…….kind of lastwaittype may NOT be current mapping if wait_type != NULL Binary to string mapping changed in SQL 2005 sys.sysprocesses.waittype is a binary value KB wrong for 2005 and 2008 The type names for sys.dm_xe_map_values may not match 100% exactly the wait_type in other DMVs (but are close enough) and the “fillers” are also in this list

27 Hot stored proc in SQL Server 2005
What’s About These? Forced Fixed time Hard to figure out scenario SLEEP_TASK Log shipping delayed to secondary DBMIRROR_DBM_EVENT Wrapped around linked server OLE-DB API calls Wait time will fluctuate since set and cleared for each call wait_resource is remote server and remote SPID PREEMPTIVE_XX type can now also show up OLEDB Thread synchronization for memory allocation High wait times = A likely bug CMEMTHREAD Resource External 3 mins Sync Hot stored proc in SQL Server 2005

28 Complete the Evaluation Form & Win!
You could win a Dell Mini Netbook – every day – just for handing in your completed form! Each session form is another chance to win! Pick up your Evaluation Form: Within each presentation room At the PASS Booth near registration area Drop off your completed Form: Near the exit of each presentation room Sponsored by Dell

29 Microsoft Technical Learning Center Located in the Expo Hall
Visit the Microsoft Technical Learning Center Located in the Expo Hall Microsoft Ask the Experts Lounge Microsoft Chalk Talk Theater Presentations Microsoft Partner Village

30 for attending this session and the 2009 PASS Summit in Seattle
Thank you for attending this session and the PASS Summit in Seattle


Download ppt "Inside SQL Server Wait Types"

Similar presentations


Ads by Google