Presentation is loading. Please wait.

Presentation is loading. Please wait.

Building a Performance Monitoring System using XEvents and DMVs

Similar presentations


Presentation on theme: "Building a Performance Monitoring System using XEvents and DMVs"— Presentation transcript:

1 Building a Performance Monitoring System using XEvents and DMVs
Ola Hallengren, Saxo Bank SQLUG Malmö

2 About me Ola Hallengren https://ola.hallengren.com
DBA in Saxo Bank, a Danish investment bank Microsoft MVP – Data Platform 2 | SQLUG Malmö

3 Agenda How we built a monitoring solution using Extended Events and DMVs Techniques that we used Demos 3 | SQLUG Malmö

4 XEvents – First steps Creating an Extended Events session using the SSMS GUI or a script, with Event File as target Query the events using the SSMS GUI or through XQuery on the production server (or copy the files to another server and query them there) 4 | SQLUG Malmö

5 XEvents – Challenges XQuery is slow
XQuery is not easy to write if you are not familiar with it Querying the events on the production server puts a load on the server (and even more if many DBAs are doing it at the same time) You need to have access to the production server to query the events (makes it difficult to give access to developers) If you copy the files to another server, you are not getting in new events Difficult to correlate events with data from DMVs (like SQL Texts and Query Plans) 5 | SQLUG Malmö

6 XEvents Monitoring - Requirements
It should be running all the time on all servers Events should be stored in a central database Events should be available for querying very close to real-time (so that it can be used in live incidents) If the monitoring solution is down, no events should be lost (it should just catch up when it starts again) Data should be easily available for DBAs and developers, and without using XQuery No XQuery on the production servers (for performance reasons) Collection of SQL Texts and Query Plans (triggered by events) SQL Server 2012 and later 6 | SQLUG Malmö

7 XEvents Monitoring – Design
A company default Extended Events session (database_health) running on all SQL Servers with Event File as target PowerShell scripts (running on a central server) collecting events every 30 seconds Using sys.fn_xe_file_target_read_file to read new events Storing data into a central database Views to access data XQuery is performed either at load time in an instead-of-trigger or in the views when data is accessed PostActions to collect SQL Texts and Query Plans 7 | SQLUG Malmö

8 XEvents Monitoring – Overview
Events Database Job Server running PowerShell scripts DBAs Database Servers 8 | SQLUG Malmö

9 Scenario I: Timeout An application is getting a command timeout in AdventureWorks. What is going on? Use ExtendedEvents.AbortedExecutions to see the aborted query See how the columns statement_last, statement, and query_plan are available, even though they are not in the events (this information is coming from sys.dm_exec_sql_text) If the wait_type = LCK_* it is waiting for locks (it is being blocked) and we can use ExtendedEvents.BlockedProcesses to see who the blocker is We can also see the root blocker 9 | SQLUG Malmö

10 Scenario II: Deadlocks
An application is getting a deadlock in AdventureWorks Use ExtendedEvents.Deadlocks to see the deadlock graph The deadlock graph is parsed in ExtendedEvents.DeadlockProcesses and ExtendedEvents.DeadlockResources We can also see the “Transaction was deadlocked on lock resources with another process and has been chosen as the deadlock victim.” errors in ExtendedEvents.Errors 10 | SQLUG Malmö

11 Scenario III: Errors An application is inserting data in a batch and are getting “String or binary data would be truncated.” - errors Use ExtendedEvents.Errors to see the errors We can see the statement, but we want to see the actual values that the application tried to insert We can add the action sql_text to get the input buffer Can generate very large amount of event data in short time, if there is a batch with many errors There will be one event for each error, and the sql_text of each event will have the complete batch 11 | SQLUG Malmö

12 The session database_health
SQL Server comes with a system_health default session that contains a lot of useful information We have created a company default extended event session that is running on all servers (database_health) Different thresholds on different servers (higher duration thresholds on OLTP - servers, than data warehouse - servers) Running with target Event File 12 | SQLUG Malmö

13 How an event is traveling - I
An event passes the predicate evaluation (filters) Additional information (Actions) is collected (e.g. session_id) The event is buffered to the memory buffers The event is written to an event file (default 30 seconds latency) 13 | SQLUG Malmö

14 How an event is traveling - II
A job runs a PowerShell script on the job server (every 30 seconds) The script is querying sys.fn_xe_file_target_read_file The first time it is getting all events from the files After that it is passing the last file name and file offset, that it has in its events database (so getting only new events) The events are inserted into the events database An instead-of-trigger is fired The trigger is extracting out the most important elements and attributes using XQuery, and also does some data type conversions 14 | SQLUG Malmö

15 How an event is traveling - III
The PowerShell script is now collecting SQL Texts and Query Plans (PostActions) Joins and additional logic (and sometimes more XQuery) in views 15 | SQLUG Malmö

16 Latency Production Demo MAX_DISPATCH_LATENCY 30 s 1 s
PowerShell Job Schedule 10 s Time to read and insert events < 1 s Total ≈ 60 s ≈ 11 s 16 | SQLUG Malmö

17 Where to do the XQuery? Extracting out elements and attributes into its own columns at load time in an instead-of-trigger is optimal for query performance, but has a cost in load performance and storage Doing the XQuery in the views is optimal for load performance and storage, but has a cost in query performance The attribute timestamp has to be extracted out at load time (as you want to be able to look at the latest events fast) In general try to avoid queries that need to do XQuery on large number of events When the performance for a query is not acceptable, then it is time to move some of the elements or attributes to its own columns 17 | SQLUG Malmö

18 Blocking The blocked_process_report event is very useful when investigating blocking problems The event is only triggered if ‘blocked process threshold’ has been enabled on the server It should not be set lower than 5 seconds Handled by the same thread in SQL Server that is searching for deadlocks A blocked_process_report event has always one blocked and one blocking process Every time the thread wakes up and is looking for blocking it has a new monitor_loop_id (filter on monitor_loop_id to get a snapshot of the blocking) <blocked-process-report monitorLoop="1369"> 18 | SQLUG Malmö

19 Blocking – Using the Execution Stack
The executionStack in the blocked_process_report can be used to see which stored procedures and statements that are involved The first frame is always the inner statement 19 | SQLUG Malmö

20 Getting SQL Texts To get an SQL Text you need an sql_handle
The handle can be used in sys.dm_exec_sql_text to get the text If you also have a start_offset and an end_offset you can extract the statement from the text The sql_handle is a hash of the text The sql_handle and offsets are available in the action tsql_frame (various events), the executionStack (blocked_process_report and xml_deadlock_report), and also in DMVs like sys.dm_exec_requests By storing the text with the handle, the next time an event comes with the same handle you don’t need to go and get it (as you already have it) 20 | SQLUG Malmö

21 Getting Query Plans – The plan_handle
The action plan_handle is a “A token that refers to the compiled plan that the query is part of.” The plan_handle can be used in sys.dm_exec_text_query_plan to get the query plan The problem is that a query plan can change while keeping the same plan_handle 21 | SQLUG Malmö

22 Getting Query Plans – Statement - level
In events like wait_info and sp_statement_completed, and DMVs like sys.dm_exec_requests you have this information available: plan_handle start_offset end_offset query_hash query_plan_hash You can store this information with the plan and the next time you come across the same combination, you don’t need to get the plan (as you already have it) 22 | SQLUG Malmö

23 Getting Query Plans – Module - level
When you only have a plan_handle (like in module_end) then you need to go out and get the plan fast You should also verify that there hasn’t been a recompile after the event (as it is then not the right plan). You can do that like this: WHERE NOT EXISTS(SELECT * FROM sys.dm_exec_query_stats WHERE plan_handle AND creation_time 23 | SQLUG Malmö

24 High frequency polling of events
Things to consider when you are polling for new events frequently using sys.fn_xe_file_target_read_file: Use small files! It is faster to query a small file than a large file (even if you specify a file name and a file offset) Only specify a wildcard in the [path] when there has been a file rollover (check the current file in sys.dm_xe_session_targets)! When you specify a wildcard, then SQL Server will access all files (even if you specify a file name and a file offset) 24 | SQLUG Malmö

25 The offset is invalid for log file …
When you are querying sys.fn_xe_file_target_read_file with a file name and a file offset it can happen that you get an error like this: “The offset is invalid for log file "...\database_health_0_ xel". Specify an offset that exists in the log file and retry your query.” You get this error if all the files have been rolled over since you read events the last time (so the file name you come with no longer exists) This is if there has been very large number of events generated in short time or if the monitoring solution has been down Increasing the number of files reduces the risk of this happening 25 | SQLUG Malmö

26 Questions? The code is available at You can contact me at 26 | SQLUG Malmö


Download ppt "Building a Performance Monitoring System using XEvents and DMVs"

Similar presentations


Ads by Google