Presentation is loading. Please wait.

Presentation is loading. Please wait.

Troubleshoot Customer Performance Problems Like a Microsoft Engineer Tim Chapman Senior Field Engineer, Microsoft.

Similar presentations


Presentation on theme: "Troubleshoot Customer Performance Problems Like a Microsoft Engineer Tim Chapman Senior Field Engineer, Microsoft."— Presentation transcript:

1 Troubleshoot Customer Performance Problems Like a Microsoft Engineer Tim Chapman Senior Field Engineer, Microsoft

2 About me… Tim Chapman Sr. Dedicated Premier Field Engineer at Microsoft Contributing author SQL Server 2012 Bible SQL Server MVP Deep Dives 2 @chapmandew

3 Agenda A brief introduction to SQLDiag and Diag Manager A troubleshooting overview  A quick look at Waits and Queues

4 SQLDiag Command line utility that ships with SQL Server Located in the installation Binn directory Gathers perfmon logs, error logs, profiler traces, blocking information, etc Requires and XML configuration file  This XML file specifies what to collect Can add custom collectors – allows you to grab the information you need You execute a PSSDIAG file, which in turn uses SQLDIAG under the covers  PSSDIAG  SQLDIAG  Collectors

5 Diag Manager What we use to create a pssdiag for you GUI tool used to create configuration file Free download from CodeplexCodeplex The more you configure to trace, the more impact you may have on performance  I rarely use Trace

6 Capturing Custom Data Collections This is the REAL power of using Diag Manager & SQL Nexus Diag Manager can capture any scripts you specify and SQL Nexus can import them into a database Once imported, you can run your own diagnostic scripts to find problems

7 Diag Manager Custom Collection Add your SQL scripts to the _MyCollectors DiagManager folder  C:\Program Files (x86)\Microsoft\Pssdiag\CustomDiagnostics\_MyCollectors Make sure the resultsets have a tag that uniquely identifies them  We will use this tag to import the data into SQL Nexus

8 SQL Nexus Tool used to import and report on SQLDiag output Allows you to develop custom collections and reports Available on Codeplex: http://sqlnexus.codeplex.com/http://sqlnexus.codeplex.com/  This means that the source code is available RML Utilities must be installed prior to installing SQL Nexus  RML Utilities for SQL Server (x86) – http://www.microsoft.com/en- us/download/details.aspx?id=8161http://www.microsoft.com/en- us/download/details.aspx?id=8161  RML Utilities for SQL Server (x64) – http://www.microsoft.com/en- us/download/details.aspx?id=4511http://www.microsoft.com/en- us/download/details.aspx?id=4511

9 For more on capturing data… My 24 hours of PASS session on PSSDiag & SQL Nexus http://www.sqlpass.org/24hours/2014/summitpreview/Sessi ons/SessionDetails.aspx?sid=7293

10 SQL Nexus Custom Diagnostics Uses a custom import process By modifying a XML configuration file, you can have SQL Nexus import your custom data collection from PSSDiag Add the name of the rowset to TextRowsetsCustom.xml  Located where you installed SQL Nexus Tip: You must enter a tag custom data collection to identify the recordset

11 Troubleshooting Techniques There are many different techniques for troubleshooting Performance Troubleshooting is an iterative process Define the problem Gather dataAnalyze Data Implement Solution Validate solution

12 Waits and Queues Time spent waiting on resources inside the database engine Use system DMVs to view this information Waits Measure system resource and utilization Use perfmon to view this information Queues A performance tuning method to help identify bottlenecks Often there is a correlation between wait types and performance counters

13 How can Waits & Queues help me? Avoid troubleshooting the “wrong” bottleneck Get the biggest payoff for your performance tuning efforts Find non-obvious performance bottlenecks (ie - Waits on memory grants) Excellent first step prior to deep tuning using other tools (SQL Profiler, logs, DMVs, Execution Plans)

14 Examples An end-user or customer tells you about vague performance issue and you’re not sure where to start You are looking for ways to speed up a database-driven application and need to identify the primary bottleneck Benchmarking and trending (great for before-after testing during development)

15 Performance Methodology Often, there is more than one resource bottlenecks on the system Approach  Identify resource that is the main bottleneck  Find the statements using the most of that resource  Tune or optimize usage  Repeat process

16 How tasks are scheduled…High level Session Created Batch Submitted Task(s) Created Workers bound to Tasks Task Executed on Scheduler

17 Session States Running Only one session can be running or executing per scheduler at a time Runnable Sessions waiting for CPU. Next SPID in the runnable queue is scheduled to start running Waiting Sessions wait in the waiter list until resources become available sys.dm_exec_requests OR sys.dm_os_waiting_tasks

18 SPID 102 Runnable Execution Model – Single Scheduler Running Runnable (Signal Waits) Waiting SPID 72 Running SPID 70 Runnable SPID 64 Runnable SPID 84 Runnable SPID 89 CXPACKET SPID 102 IO_COMPLETION SPID 75 LCK_M_S SPID 72 LCK_M_X SPID 102 Signalled SPID 70 Running

19 wait_time_ms versus signal_wait_time Resource Wait Time Signal Wait Time Total Wait Time sys.dm_os_wait_stats

20 Example: Mapping Waits to Queues Conclusion? IO subsystem may be issueHigh I/O bandwidth issues Need to now identify and see if any un-tuned queries contribute to the IO Perfmon Counters Average Disk Queue Length (consistently high) Disk sec/read and disk sec/write is also high Log Bytes Flushed/sec high Wait Type PAGEIOLATCHASYNC_IO_COMPLETIONWRITELOG

21 How to read sys.dm_os_wait_stats wait_type – the resource unit that SQL Server is waiting on waiting_tasks_count – #of tasks that have spent time on this resource wait_time_ms – total time spent waiting overall max_wait_time_ms – max time spent waiting signal_wait_time_ms – time spent in the runnable queue

22 A few notes on wait stats… Not all wait types are useful to report on Waits are accumulated after event occurs Granularity depends on event type Viewing as a whole may or may not be useful Time slicing can be more useful for trending and/or problem analysis

23 Common High Waits If you’re seeing an extremely high % of a single wait on your system, it is very likely one of the following

24 WRITELOG When we flush a database change to the transaction log file Logical Disk: Avg Disk Sec/Write Logical Disk: Avg Disk Bytes/Write Database: Log Flushes/Sec Database: Log Flush Write Time (ms) sys.dm_io_virtual_file_stats sys.dm_io_pending_io_requests

25 CXPACKET Parallel queries are happening Not a good or bad thing Sometimes less desirable for OLTP workloads Need to know how the application is supposed to be processing before you can make a definite judgment sys.dm_os_waiting_tasks sys.dm_exec_query_stats

26 SOS_SCHEDULER_YIELD When a thread voluntarily releases its hold on the scheduler to allow another thread to perform its work Not necessarily a problem unless it consumes a very high % of wait time on the system Processor: % User Time System: Context Switches/sec sys.dm_os_spinlock_stats sys.dm_exec_requests

27 PAGEIOLATCH_* Latching a buf structure to move a page to memory from disk Long waits may indicate a disk or memory issue Logical Disk: Avg Disk Sec/Read Logical Disk: Disk Bytes/Sec Buffer Manager: Checkpoint Pages/sec Buffer Manager: Page Life Expectancy sys.dm_io_virtual_file_stats sys.dm_io_pending_io_requests

28 PAGELATCH_* A task is waiting for a page latch not associated with an IO request Can be caused by inserts into the same page or contention on allocation pages sys.dm_exec_requests sys.dm_os_waiting_tasks sys.dm_os_buffers_descriptors

29 ASYNC_NETWORK_IO Typically occurs because the client requesting data from SQL Server isn’t processing it fast enough Look at how the client is processing data Network Adapter: Current Bandwidth Network Adapter: Bytes Total/sec Network Adapter: Output Queue Length

30 OLEDB Occurs when SQL calls the OLE DB Provider Often associated with 3 rd party tools that heavily call DMVs Also can be associated with Linked Server calls, RPC calls, OpenQuery, OpenRowset or Profiler sys.dm_exec_requests

31 LCK_* Waiting to acquire a lock We accumulate these AFTER the lock has been released Access Methods: Table Lock Escalations/sec Locks: Lock wait time (ms) Locks: Lock waits/sec Memory Manager: Lock Memory (KB) Missing Indexes sys.dm_db_index_operational_stats sys.dm_tran_locks sys.dm_exec_requests

32 RESOURCE_SEMAPHORE Waiting for a memory grant due to a high number of concurrent queries or excessive memory grant requests Not uncommon for DW workloads Resource Governor can help Memory Manager: Memory Grants Pending Memory Manager: Memory Grants Outstanding sys.dm_exec_query_memory_grants sys.dm_exec_query_resource_semaphores

33 LATCH_* When a latch is acquired on some non-buffer construct Mostly internal uses – usually not a lot you can do about it sys.dm_os_latch_stats Extended Events

34 tempdb A performance talk is not complete with mentioning tempdb Can become a bottleneck if not properly sized/allocated Faster drives here are better (hint: SSD) We use tempdb for a LOT (to name a few): Temporary tables(#) & table variables (@) Internal work tables (Spools) Spills (hash/sort/exchange) Version Store

35 tempdb cont. Make sure all files are equally sized upon creation For # of files, we recommend:  <8 Cores = use 8 tempdb files  >=8 Cores = use 8 unless you still have latch contention  Then add 4 at a time afterwards

36 And now…Perfmon. Incredibly useful for SQL Server troubleshooting Not enough people use this tool The “hard” part is knowing what to monitor …but, we are here to help with that today. Introduces very little overhead on a SQL Server system

37 Running Performance Monitor Start  Run  perfmon Causes perfmon to open with only a single counter Start  Run  perfmon /sys Causes perfmon to open with your saved counters Allows you to save.PerfmonCfg settings, which allows you to move to other machines

38 Useful Memory Counters SQL Server:Buffer Manager  Page Life Expectancy  Checkpoint Pages/Sec  Free Pages  Lazy Writes/Sec Memory Manager: Memory Grants Pending Process: Working Set Memory: Available MBytes

39 Useful CPU Counters Very high and sustained CPU usage is a clear sign of an underlying problem Processor: % Privileged Time Processor: % Processor Time Process: * SQL Statistics: Batch Requests/Sec Databases: Transactions/Sec SQL Statistics: Compiles/Sec

40 Useful Process Counters Very useful to debug misbehaving processes on a machine Always look here to see if processes are competing with SQL Server – such as Anti-virus Useful counters:  IO Data Bytes/sec  % Processor Time  Working Set  Private Bytes

41 Useful IO Counters Logical Disk Monitors logical partitions on the machine (drive letters) Physical Disk Monitors physical disks on the machine Avg Disk Sec/Read Avg Disk Sec/Write % Idle Time Disk Transfers/sec

42 Power Settings The server power settings can be one of the easiest performance gains Setting from the default (Balanced) to High Performance can lead to gains of 20% or more Processor Information: % Processor Performance

43 Application Settings Look at Process counters for specific processes Find out if the application is requesting too much data Application time-outs aren’t necessarily a SQL related problem Avoid applications installed on the same machine as SQL

44 Filter Drivers Typically exists in the form of anti-virus (AV) software If you insist on using AV on the database instance, make sure to exclude the SQL Process and all SQL data types Look at Process: IO Data Bytes/sec for processes driving high IO levels other than SQL Server

45 Performance Dashboard Reports Set of SSRS performance reports that integrate into SSMS Great for tracking down trickier performance issues

46 How to setup the Performance Dashboard 1.Install the setup script  This will load a series of sprocs into msdb on the SQL instance 2.On your local copy of SSMS, load the main SSRS Performance Dashboard report 3.Find slow things!

47 Performance Dashboard

48 Performance Analysis of Logs (PAL) Tool Free tool used to analyzer Perfmon logs Allows you to set custom thresholds or use thresholds already configured for your workload  There is a SQL Server workload that looks at SQL Server counters Available on Codeplex: http://pal.codeplex.com/http://pal.codeplex.com/ Does take some analysis time, so be prepared to wait if you need to analyze a lot of perfmon information

49 PAL Output The output is color coded to let you know the areas to focus on  You do have some control over this through the threshold files  Not everything in red actually means something  You must know what to look for

50 PAL Output Graphs show thresholds Alerts summarized in time slices

51 When to Use Which Tool? PAL is great for overall system performance ( PAL Download )( PAL Download )  Benchmark  Get acquainted with a workload PSSDIAG/Nexus  More targeted performance analysis  Need to view SQL internal resources (waits, blocking chains, query plans)  Short timespan for collection


Download ppt "Troubleshoot Customer Performance Problems Like a Microsoft Engineer Tim Chapman Senior Field Engineer, Microsoft."

Similar presentations


Ads by Google