Building a Performance Monitoring System using XEvents and DMVs

Slides:



Advertisements
Similar presentations
Module 17 Tracing Access to SQL Server 2008 R2. Module Overview Capturing Activity using SQL Server Profiler Improving Performance with the Database Engine.
Advertisements

Acknowledgments Byron Bush, Scott S. Hilpert and Lee, JeongKyu
Backup, Integrity Check and Index and Statistics Maintenance
Backup, Integrity Check and Index and Statistics Maintenance
1 Chapter Overview Transferring and Transforming Data Introducing Microsoft Data Transformation Services (DTS) Transferring and Transforming Data with.
Module 8: Server Management. Overview Server-level and instance-level resources such as memory and processes Database-level resources such as logical.
Module 12 Handling Errors in T-SQL Code. Module Overview Understanding T-SQL Error Handling Implementing T-SQL Error Handling Implementing Structured.
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
Module 7 Reading SQL Server® 2008 R2 Execution Plans.
Danette Dineen Riviello Magellan Health March 17,
DMV Performance Monitoring & Tuning Presented by Franklin Yamamoto.
Learningcomputer.com SQL Server 2008 – Administration, Maintenance and Job Automation.
Triggers A Quick Reference and Summary BIT 275. Triggers SQL code permits you to access only one table for an INSERT, UPDATE, or DELETE statement. The.
Learningcomputer.com SQL Server 2008 – Profiling and Monitoring Tools.
SQLRX – SQL Server Administration – Tips From the Trenches SQL Server Administration – Tips From the Trenches Troubleshooting Reports of Sudden Slowdowns.
Enterprise Database Administration & Deployment SIG ▪ 313M ▪ Sept 29, 2005 ▪ 10:15 AM SQL Server 2005 Performance Diagnosis and Tuning using SQL Tools.
Stored Procedure Optimization Preventing SP Time Out Delay Deadlocking More DiskReads By: Nix.
Retele de senzori Curs 2 - 1st edition UNIVERSITATEA „ TRANSILVANIA ” DIN BRAŞOV FACULTATEA DE INGINERIE ELECTRICĂ ŞI ŞTIINŢA CALCULATOARELOR.
SQL Advanced Monitoring Using DMV, Extended Events and Service Broker Javier Villegas – DBA | MCP | MCTS.
Session Name Pelin ATICI SQL Premier Field Engineer.
SQL Saturday #509 | Brisbane 2016 #509 | Brisbane 2016 Martin Catherall Moving from Profiler to Extended Events.
With Temporal Tables and More
Extend Your Knowledge with Extended Events!
Automated Restore Script Output for Ola Hallengren’s Maintenance Solution 1) Start SQL Services on Local 2) Connect to Azure AlwaysOn 3) Delete all maintenance.
Query Optimization Techniques
Building a Performance Monitoring System using XEvents and DMVs
Designing For Testability
Things You Can Find in the Plan Cache.
Troubleshooting SQL Server When You Cannot Access The Machine
Dynamic SQL: Writing Efficient Queries on the Fly
Introduction to Triggers
Building a Performance Monitoring System using XEvents and DMVs
SQL Server Monitoring Overview
Extend Your Knowledge with Extended Events!
Medlemsträff i Stockholm
Database Performance Tuning and Query Optimization
Root Cause Analysis with DMVs
Auditing in SQL Server 2008 DBA-364-M
Transactional Replication A Deeper Dive Drew Furgiuele, Senior DBA IGS
Azure Automation and Logic Apps:
Peeking into the Plan Cache with SQL Server 2008
Performance Monitoring Using Extended Events, DMVs & Query Store
Building a Performance Monitoring System using XEvents and DMVs
Lock, Block, and Two Smoking CPUs
Third Party Tools for SQL Server
The Ultimate Maintenance Plan By Ed Roepe Perimeter DBA, LLC
In-Memory OLTP (IMOLTP) What Can It Do For Me?
Wellington, SQLSaturday#706
Query Optimization Techniques
Dynamic SQL: Writing Efficient Queries on the Fly
TEMPDB – INTERNALS AND USAGE
The Ultimate Maintenance Plan By Edward Roepe Perimeter DBA, LLC
Dynamic Management Views a practical overview!
Moving from SQL Profiler to xEvents
Chapter 11 Database Performance Tuning and Query Optimization
Dynamic Management Views a practical overview!
Tonga Institute of Higher Education IT 141: Information Systems
Tonga Institute of Higher Education IT 141: Information Systems
Analyzing Performance Problems Using XEvents, DMVs & Query Store
Database System Concepts and Architecture
Query Optimization Techniques
Server-Side Programming
Keeping ConfigMgr Clean
Denis Reznik SQL Server 2017 Hidden Gems.
Extended Events: Successful troubleshooting recipes
Inside the Database Engine
Developing Microsoft SQL Server Databases
Analyzing Performance Problems Using XEvents, DMVs & Query Store
Inside the Database Engine
Inside the Database Engine
Presentation transcript:

Building a Performance Monitoring System using XEvents and DMVs Ola Hallengren, Saxo Bank SQLUG Malmö - 2016

About me Ola Hallengren https://ola.hallengren.com E-mail: ola@hallengren.com DBA in Saxo Bank, a Danish investment bank Microsoft MVP – Data Platform 2 | SQLUG Malmö - 2016

Agenda How we built a monitoring solution using Extended Events and DMVs Techniques that we used Demos 3 | SQLUG Malmö - 2016

XEvents – First steps Creating an Extended Events session using the SSMS GUI or a script, with Event File as target Query the events using the SSMS GUI or through XQuery on the production server (or copy the files to another server and query them there) 4 | SQLUG Malmö - 2016

XEvents – Challenges XQuery is slow XQuery is not easy to write if you are not familiar with it Querying the events on the production server puts a load on the server (and even more if many DBAs are doing it at the same time) You need to have access to the production server to query the events (makes it difficult to give access to developers) If you copy the files to another server, you are not getting in new events Difficult to correlate events with data from DMVs (like SQL Texts and Query Plans) 5 | SQLUG Malmö - 2016

XEvents Monitoring - Requirements It should be running all the time on all servers Events should be stored in a central database Events should be available for querying very close to real-time (so that it can be used in live incidents) If the monitoring solution is down, no events should be lost (it should just catch up when it starts again) Data should be easily available for DBAs and developers, and without using XQuery No XQuery on the production servers (for performance reasons) Collection of SQL Texts and Query Plans (triggered by events) SQL Server 2012 and later 6 | SQLUG Malmö - 2016

XEvents Monitoring – Design A company default Extended Events session (database_health) running on all SQL Servers with Event File as target PowerShell scripts (running on a central server) collecting events every 30 seconds Using sys.fn_xe_file_target_read_file to read new events Storing data into a central database Views to access data XQuery is performed either at load time in an instead-of-trigger or in the views when data is accessed PostActions to collect SQL Texts and Query Plans 7 | SQLUG Malmö - 2016

XEvents Monitoring – Overview Events Database Job Server running PowerShell scripts DBAs Database Servers 8 | SQLUG Malmö - 2016

Scenario I: Timeout An application is getting a command timeout in AdventureWorks. What is going on? Use ExtendedEvents.AbortedExecutions to see the aborted query See how the columns statement_last, statement, and query_plan are available, even though they are not in the events (this information is coming from sys.dm_exec_sql_text) If the wait_type = LCK_* it is waiting for locks (it is being blocked) and we can use ExtendedEvents.BlockedProcesses to see who the blocker is We can also see the root blocker 9 | SQLUG Malmö - 2016

Scenario II: Deadlocks An application is getting a deadlock in AdventureWorks Use ExtendedEvents.Deadlocks to see the deadlock graph The deadlock graph is parsed in ExtendedEvents.DeadlockProcesses and ExtendedEvents.DeadlockResources We can also see the “Transaction was deadlocked on lock resources with another process and has been chosen as the deadlock victim.” errors in ExtendedEvents.Errors 10 | SQLUG Malmö - 2016

Scenario III: Errors An application is inserting data in a batch and are getting “String or binary data would be truncated.” - errors Use ExtendedEvents.Errors to see the errors We can see the statement, but we want to see the actual values that the application tried to insert We can add the action sql_text to get the input buffer Can generate very large amount of event data in short time, if there is a batch with many errors There will be one event for each error, and the sql_text of each event will have the complete batch 11 | SQLUG Malmö - 2016

The session database_health SQL Server comes with a system_health default session that contains a lot of useful information We have created a company default extended event session that is running on all servers (database_health) Different thresholds on different servers (higher duration thresholds on OLTP - servers, than data warehouse - servers) Running with target Event File 12 | SQLUG Malmö - 2016

How an event is traveling - I An event passes the predicate evaluation (filters) Additional information (Actions) is collected (e.g. session_id) The event is buffered to the memory buffers The event is written to an event file (default 30 seconds latency) 13 | SQLUG Malmö - 2016

How an event is traveling - II A job runs a PowerShell script on the job server (every 30 seconds) The script is querying sys.fn_xe_file_target_read_file The first time it is getting all events from the files After that it is passing the last file name and file offset, that it has in its events database (so getting only new events) The events are inserted into the events database An instead-of-trigger is fired The trigger is extracting out the most important elements and attributes using XQuery, and also does some data type conversions 14 | SQLUG Malmö - 2016

How an event is traveling - III The PowerShell script is now collecting SQL Texts and Query Plans (PostActions) Joins and additional logic (and sometimes more XQuery) in views 15 | SQLUG Malmö - 2016

Latency Production Demo MAX_DISPATCH_LATENCY 30 s 1 s PowerShell Job Schedule 10 s Time to read and insert events < 1 s Total ≈ 60 s ≈ 11 s 16 | SQLUG Malmö - 2016

Where to do the XQuery? Extracting out elements and attributes into its own columns at load time in an instead-of-trigger is optimal for query performance, but has a cost in load performance and storage Doing the XQuery in the views is optimal for load performance and storage, but has a cost in query performance The attribute timestamp has to be extracted out at load time (as you want to be able to look at the latest events fast) In general try to avoid queries that need to do XQuery on large number of events When the performance for a query is not acceptable, then it is time to move some of the elements or attributes to its own columns 17 | SQLUG Malmö - 2016

Blocking The blocked_process_report event is very useful when investigating blocking problems The event is only triggered if ‘blocked process threshold’ has been enabled on the server It should not be set lower than 5 seconds Handled by the same thread in SQL Server that is searching for deadlocks A blocked_process_report event has always one blocked and one blocking process Every time the thread wakes up and is looking for blocking it has a new monitor_loop_id (filter on monitor_loop_id to get a snapshot of the blocking) <blocked-process-report monitorLoop="1369"> 18 | SQLUG Malmö - 2016

Blocking – Using the Execution Stack The executionStack in the blocked_process_report can be used to see which stored procedures and statements that are involved The first frame is always the inner statement 19 | SQLUG Malmö - 2016

Getting SQL Texts To get an SQL Text you need an sql_handle The handle can be used in sys.dm_exec_sql_text to get the text If you also have a start_offset and an end_offset you can extract the statement from the text The sql_handle is a hash of the text The sql_handle and offsets are available in the action tsql_frame (various events), the executionStack (blocked_process_report and xml_deadlock_report), and also in DMVs like sys.dm_exec_requests By storing the text with the handle, the next time an event comes with the same handle you don’t need to go and get it (as you already have it) 20 | SQLUG Malmö - 2016

Getting Query Plans – The plan_handle The action plan_handle is a “A token that refers to the compiled plan that the query is part of.” The plan_handle can be used in sys.dm_exec_text_query_plan to get the query plan The problem is that a query plan can change while keeping the same plan_handle 21 | SQLUG Malmö - 2016

Getting Query Plans – Statement - level In events like wait_info and sp_statement_completed, and DMVs like sys.dm_exec_requests you have this information available: plan_handle start_offset end_offset query_hash query_plan_hash You can store this information with the plan and the next time you come across the same combination, you don’t need to get the plan (as you already have it) 22 | SQLUG Malmö - 2016

Getting Query Plans – Module - level When you only have a plan_handle (like in module_end) then you need to go out and get the plan fast You should also verify that there hasn’t been a recompile after the event (as it is then not the right plan). You can do that like this: WHERE NOT EXISTS(SELECT * FROM sys.dm_exec_query_stats WHERE plan_handle = @plan_handle AND creation_time > @timestamp) 23 | SQLUG Malmö - 2016

High frequency polling of events Things to consider when you are polling for new events frequently using sys.fn_xe_file_target_read_file: Use small files! It is faster to query a small file than a large file (even if you specify a file name and a file offset) Only specify a wildcard in the [path] when there has been a file rollover (check the current file in sys.dm_xe_session_targets)! When you specify a wildcard, then SQL Server will access all files (even if you specify a file name and a file offset) 24 | SQLUG Malmö - 2016

The offset is invalid for log file … When you are querying sys.fn_xe_file_target_read_file with a file name and a file offset it can happen that you get an error like this: “The offset 2394624 is invalid for log file "...\database_health_0_130903806628660001.xel". Specify an offset that exists in the log file and retry your query.” You get this error if all the files have been rolled over since you read events the last time (so the file name you come with no longer exists) This is if there has been very large number of events generated in short time or if the monitoring solution has been down Increasing the number of files reduces the risk of this happening 25 | SQLUG Malmö - 2016

Questions? The code is available at https://ola.hallengren.com/scripts/PerformanceStore.zip You can contact me at ola@hallengren.com 26 | SQLUG Malmö - 2016