Performance Management

Performance Management
Hello – My name is Janis Griffin and my presentation is on Performance Management lookcing at 2008 Management Data Warehouse. Performance Management 2008 MDW Janis Griffin Senior DBA, Confio Software 1 1

Who Am I? Senior DBA for Confio Software
Twitter: @DoBoutAnything 20+ Years in SQL Server, Sybase & Oracle DBA and Developer Specialize in Performance Tuning Review Performance of 100’s of Databases for Customers and Prospects Just a little bit about myself before we get started. I’ve been a DBA for over 20 years now, primarily focusing on Sql Server, Sybase & Oracle. I came to Confio Software over 3 years ago. Confio makes Database Performance Tools focusing on Response Time analysis from an end user perspective. One of my main responsibilities at Confio is to work with our customers and prospective customer in reviewing performance issues in their databases and to help them work on those problems by giving some ideas of how to fix them.

Agenda Management Data Warehouse (MDW) Data Collection
What is it How to set it up Centralized MDW Database Data Collection for Multiple Instances Data Collection System collection sets Disk Usage, Server Activity and Query Statistics. User-defined collection sets Reporting Shortcomings Of MDW How Could It Be Better? A Comparison Here is the agenda I hope to cover today. We are going to take a look at Management Data Warehouse or MDW for short. Find out what it is and why you should set it up. I’ll go through several screenshot examples of how to set it up and give you some ideas on best practices when setting it up. Then we’ll take a look at data collections and what types of performance information we can get in way of collection sets and reporting. Finally, we’ll look at some of the shortcomings of MDW and how it could be better. Then we’ll look at why DBA’s need this data and I’ll show a few slides from other performance tools to bring it all home.

Management Data Warehouse (MDW)
Centralized MDW Database Supports 2008 & Up Holds multiple Instances performance data Consider using separate instance & server Don’t want to collect data on the data collector Sizing of MDW Database & maintenance On Creation - Initially sized Data: 100MB with 50MB auto-growth Log: 10MB with 10MB auto-growth Pre-allocate Data & Log sizes to minimize auto-growth Data growth estimates are MB per day/per DB Separate Data files and Log Files on separate drives Recovery model should reflect backup/restore strategy Default is Simple Recovery Microsoft started enhancing SQL server with information about it’s performance activity in 2000 when they instrumented the code with ‘wait types’. In 2005, they got better at showing this information by giving us the Dynamic Management Views (or DMVs). They continued to build on that performance data by giving us the Management Data Warehouse or MDW in The MDW is a centralized database which can hold performance information for multiple instances. Typically, the MDW is installed on a separate server in it’s own instance or it is shared with other performance related tools. The reason for this is that you don’t usually want to collect performance data for the MDW. Also, by having it stand-alone, you can easily grow the MDW to collect lots of data from many other instances. When you set up a MDW database, you should carefully plan how many Instances that you’ll be targeting for data collections. Then create your MDW database with that sizing in mind. When setting up the MDW database, you’ll want to use standard best practices as if it were any production database. Separate Data files from the Log Files on different drives, pre-allocate space to Data and log files to minimize auto-growth. Plan your recovery strategy – the default is simple recovery and is probably sufficient since it’s not mission critical. The simple recovery model allows you to recover data only to the most recent full database or differential backup. Transaction log backups are not available because the contents of the transaction log are truncated each time a checkpoint is issued for the database. You should know that the default sizing for MDW Data is 100MB with 50MB auto-growth. And the log is 10m with 10mb autogrowth. This is a very conservative number as I’ve seen data growth from very busy instances to be between mb per day. On my laptop, with 2 instances, it grows 90mb per day. So, be sure to plan on adequate disk space.

How To Create MDW Database
Ok, so we’ve talked about what MDW is, so now how do we set up it up? In Management Studio, in the Object View, I’ve expanded the Managemant folder and found the Data Collection option. By right-clicking on that option, I can choose ‘Configure Management Data Warehouse’.

Wizard - Create MDW Database
That will bring up a wizard that basically walks you through creating the database and installing the schema needed to hold that data collection for the other instances. So I select the ‘Create or upgrade a management data warehouse’.

Create Centralized MDW Database
After choosing ‘Next’, I’m presented with a ‘New Database’ screen. This is where you can reset the default sizing and location of the database files. For the purposes of this presentation, I’ve set up the data file to be 500 mb and the log file to be 50 mb.

Manage Recovery & Access for MDW
Another database property to review is the Recovery Model – which in this case, I’m going to leave at ‘simple’ recovery. The second screen here shows how the Configuration Wizard allows you to map logins to different roles in the Management Data Warehouse. You may want to think about who can run Reports out of MDW as they would need the mdw_reader role. which would be the mdw_reader role. The mdw_writer role is for people who can create custom collection sets and upload them to MDW. And of course, the mdw_admin role allows access to all. Once I’m filled this out, I press ‘Next’

Create Centralized MDW Database
The wizard displays the information that it’s going to use to create the database and schema. In this case, I’m creating database called MDW in instance named jgriffin-2\jgr2. It’s going to run the schema installation script on MDW and map the following the CONFIO\jgriffin login. Once I press ‘Finish’, the MDW database is created and we can view the tables that it created. Notice that there are 2 schema owners. One is called ‘core’, which hold tables that are used internally for MDW to map the Instances and then the ‘snapshot’ schema which holds the performance information data itself. So now that we’ve created the MDW database, how do we get information into it.

Set Up Data Collection Run MDW Wizard on each Instance
Point Data Collection to Centralized MDW Set up local cache directory Map Logins & Define Users Admins / Readers / Writers (custom data collectors) Start SQL Server Agent Basically, I will need to run the MDW Wizard on each SQL Server Instance that I want to collect data from. As I set up the data collection, I’ll point it to the Centralized MDW on jgriffin-2/jgr2 to hold the data. I’ll also need to set up a local cache directory to store the data collection files which I’ll talk more about later. Then I’ll map logins to roles and start the sql server agent on the instance.. The next few slides will show these steps

Run MDW Wizard on Each Instance
In Management Studio, I connect to the Instance that I want to do the data collection from, then I bring up the Configuration MDW Wizard again and instead of creating the MDW database, I chose the second option which is ‘Set up data collection’. Now I’ll run this wizard and go through these same steps for each Instance that I want to collect data from. After I press next, I’m prompted for the server and database that hosts the management data warehouse. Which I’ve entered my MDW at Jgriffin-2\jgr2. Then, I’m prompted for a local directory which will temporarily keep the data as it uploads the data warehouse. It will default to the %TEMP% directory if you don’t specify another directory. Not all collections sets are cached but this technique is used to loosely couple the actual collection of data from the communication and loading to MDW. More about this local cache on the next slide.

Local Cache Directory Local Server where data is collected from Instance Stores then forwards data collection (2 jobs) Collects Data into local files (at some interval) Loads to MDW (at some interval) File Naming Convention servername_MSSQL10_50_instance_{GUID}_##.cache Defaults to %temp% directory If changed, make sure Server Agent has read/write perms Server Activity & Query Statistics are cached by default not Disk Usage – real time load SEBOX2_MSSQL10_50_S2008R2_{ FD4-4EB6-AA04-CD59D9BB5714}_36.cache Basically the cache directory is local to the server where the data is being collected. It stores the data collection files in this directory and then forwards them to the MDW at some pre-defined interval. So instead of collecting data every 10 seconds and immediately inserting that data into MDW, the local cache allows the data collection to occur asynchronous of the uploading of data into the MDW. In fact, if you have several instance doing data collections, you can stagger the uploads so that they don’t clobber the MDW server. The files have a naming convention that I listed here. The name is prefix with the servername followed by the sql version, following by the instance name, GUID, and number. The suffix is .cache. You can see the one set up for this presentation is on SEBOX2 with an instance name of S2008R2. As I said earlier, the local directory defaults to the directory where the environment variable is set. You can change the location – which I’d recommend if you have a very busy system so less active disk. If you do that, however, make sure the SQL Agent has read/write permissions otherwise you won’t be able to collect the data. Finally, out of the system collections sets, 2 of the collections are cached – server activity and query stats. More about this later.

Data Collection Jobs msdb.dbo.syscollector_collection_items_internal
So I’ve set up the data collection on SEBOX2\S2008R2 and as you can see I have several jobs under the SQL Server Agent which I enabled. Noticed, collection_set_1 that is named ‘noncached_collect’. This job is not using the local cache – instead it’s the ‘disk usage collection’ and is directly inserting the data in to the MDW. We can’t tell from looking at the names, but collection_set 2 and 3 are server activity and query stats collection sets. You can map the IDs to their names by looking at dbo.syscollector_collection_items_internal in the msdb database msdb.dbo.syscollector_collection_items_internal

Data Collection Status – Log Viewer
Lastly, once I’ve set up data collection, I’ll want to periodically check the logs to make sure there aren’t any errors.. . To view the data collection logs, I can either right-click on the Data Collection Objects or the specific collection set. Here I’m looking at the status of the query stats collection. The big red x was because the server got rebooted and I didn’t have the sql agent set to start up automatically. Also, remember to view Job History & Sql Agent logs

What runs where? 4 Servers with 4 Instances & 1 Centralized MDW Server
msdb Jobs Schedules msdb Jobs Schedules msdb Jobs Schedules msdb Jobs Schedules dir\*.cache files dir\*.cache files dir\*.cache files dir\*.cache files So, now I’ve got an active, running system collection set running from sebox2 which is populating my centralized MDW. So let’s review –what runs where? Hopefully this chart is helpful. Here is an example of 4 instances that are performing data collections and uploading to a centralized MDW. As you can see on the Data Collecting servers, processes access stored procedures out of the msdb database and those processes are running out of 5 jobs set up on every instance. You can see that 2 of the jobs are writing to the local cache. While 3 of the jobs are uploading or writing directory to the MDW. So the more Instances you have collecting the more important it is to stagger the schedule of the uploads and inserts into MDW – other wise it will get clobbered. MDW

Reports At Each Instance
So now that I’ve got to all the trouble setting this up. What information do I get out of it. Well, you can reports on Query Stats, Server Activity and Disk Usage. Which I will be going through in the next slides. It’s important to note here, however, that I’m logged on to the Instance where I’m doing the data collection, Not the MDW

Query Statistics Report
Cached Mode – Collected every 10 seconds / Uploaded every 15 minutes Here is an example of the Query Stats Report. Notice that I can select specific data ranges to see what was running as a specific time. In this example, I’m looking on 9/10 at 6pm for 12 hour timeframe or 6am on 9/11 The default retention of data for query stats are 14 days. Notice that this data is using the cached mode or store/forward method. So it collects data every 10 seconds and uploads to MDW every 15 minutes. The top 10 queries are listed during this timeframe, order by CPU. However, you can rank it by the other categories listed here. If you select a specific query, I can get a graphical execution plan details and see wait types or resources that the query waited on as well as other stats.

Server Activity Report
Cached Mode – Collected every 60 seconds / Uploaded every 15 minutes Here is an example of the Server Activity report. Notice that collection set is collected every 60 seconds and is in cached mode uploading every 15 minutes. The default history retention for this data is also 14 day. This report will give you an idea of system health or at what rate the resources that are being consumed. Cpu Utilization, Memory usage, i/o and network plus what the system waited most on. Also, you can click on any one of the boxes here to get further supporting statistics.

Disk Usage Report Non-Cached Mode – Both Data / Log Files Collected every 6 hours The last report that you can get is the disk usage report. This data collection is basically non-cached so it writes directly to the MDW. But it only runs every 6 hours for both the data and the log files so should not have a performance impact. Default retention for this data is 730 days or 2 years of data. The first page of the report lists all the databases and logs, their start size, current size, average daily growth and current trend. You can drill into a specific database to get further usage and growth information.

Custom Data Collection Example
That is all the information that MDW provides you unless you create custom data collections to gather more data. If you create custom data collections, you will need write custom reports from the MDW database. This is an example of a custom data collection that I’ll touch on briefly. I won’t go into great detail here because it’s almost another presentation in itself. If this is something you are interested in, books online has great documentation on how to set up custom data collections. This example shows that I’m on the Instance that I want the data collection to run on – SEBOX2. I create the custom data collection in the MSDB system database because that is where all the collection procedures reside. This custom data collection will collect index usage stats from the dmv – dm_db_index_usage_stats. Once I execute this t-sql, and refresh the Data Collection Folder, I can see my new data collection which I named DMV_INDEX_USAGE. Now if I right-click I can view the logs to see that it’s actually working and I should have also refreshed my ‘server agent’ view because I now have 2 new jobs as well. Custom data collections don’t have any reports. We will need to create them out of the MDW.

Custom Data Reports on MDW
So here is an quick example of creating a custom report. As you can see, once I’ve create the custom data collection, there is a new table in the MDW with a custom_snapshot schema owner called index_usage_stats. I can then start reporting from that table. In this example, I’m looking specifically at the database ‘Activity’ and returning the index name, the number of times accessed. This is a really simple example – there are a lot more complex data collections that you could create and of course many more custom reports to make.

Shortcomings Of MDW No Centralized Monitoring Centralized Data Store
No Response Time Analysis Server Health Stats & Top 10 Queries No Notifications or Alerts Must Login To Each Instance & View Complex Customizations Installed On Each Instance / T-SQL Complex Architecture 5 Jobs/Instance, Logs, Caches, Agents Tedious Setup & Conservative Defaults Need To Run Wizard on Each Instance Limited Default Reporting 3 Reports Forces ‘Server’ Heath View Limited Performance Coverage Works Only On Does Not Facilitate Team Collaboration DBA Tool Only No Centralized Monitoring. MDW has centralized storage. However, you have to configure it on each instance and it collects/reports information on each instance. Very Little Response time analysis. MDW collects very basic information or ‘server health stats’ for each instance as well as top ten queries. However, it does not easily tie both views together to find the actual individual queries causing issues. It’s difficult to quickly identify what query is the spike and/or issue. Limited Customizations. MDW allows for customizations, but they are not easy to implement and if you want to see the results of your customizations, you need to create your own reports in MDW. Ease of setup and default collections. To utilize MDW, you need to go to each instance and run through the MDW wizard or build a set of scripts in order to push out an install. In addition, the default settings are very conservative so they may not collect and/or keep important data points. Notifications and alerts. MDW does not give a view across all instances, which means you will need to go to each instance and manually view the data in order to understand if an issue exists. In addition, there is not an alerting mechanism for any of the data that is collected. Limited Default Reporting Capability. There are only 3 types of reports that forces you to view the activity from a ‘SERVER’ point of view. What are your end users experiencing? Complexity. Sql Server Agent is required to run per each Instance monitored. Also, additional ‘error’ monitoring is needed for the 5 Jobs per Instance. Finally, if the data is cached, extra disk space/activity for the ‘store & forward’ process is needed to get the data to MDW. Too many piece/parts! Overall Performance Coverage. Works only on SQL Server Instances Does Not Facilitate Team Collaboration – Mostly for DBA use – no management reports - not a good tool for communicating.

How Could It Be Better? Proactive View What if a User Complains
Firefighting – Drive to ‘Root Cause’ Blocking Issue (not tuning issue) Long Term Trends Current (real-time) Issue So how could it be better?

Proactive View Here is an example of how Ignite shows and presents Response time information in what I call the ‘proactive’ view. To understand what might be going wrong in your instance, you need to be able to go back in time to see how this instance typically runs. In other words, you need to have an idea of what ‘normal’ times are, in order to be able to spot the ‘abnormal’ times. This trend screen in Ignite shows 7 days of response times analysis for the sybase instance on a machine called ‘Gibson’ Each bar represent the total time spent in the instance fro 1 day for the top 15 sql statements running in a database. Each color in the bar represents a unique sql statement or stored procedure call and quickly shows you which sqls are taking the most time. The legend to the right is ordered by the sql with the highest amount of time spent so you can quickly see that the purple color – the one I’ve called Sel Customers is the top sql statement this instance. Notice on the left hand side the amount of accumulated hours are spent in this Instance. As you can see August 10th was a particularly bad day with processes spending almost 50 hours of accumulated time spent on those top 15 sql statements. Also, notice at the bottom of the screen, Ignite goes through some initial analysis for you – in the top query problem section, Ignite basically go out every hour each the day and tells you which sql you should focus on and it even give you some ideas on how to tune them. It does this every day, so you can view this data not only for today but historically as well. It look like for this instance, Ignite has warning for 3 sqls running on the August 12th . The first one is a blocking issue which if you click on the ‘more’ link will give more information on about the sql and what it is waiting on.

User Complains Here is another case where Response time Analysis can help. Let say you have a user that comes to your office and complains that every day beginning from 10am – noon for the last 4 days, the database seems to be really slow. You can use this data to drill into the specific time frame the user is talking about see exactly what was running and how long it’s taking. In this example, we are actually looking at 5 days to see what was running before the user noticed the slowdown. As you can see the top 2 query – the purple and pink colors on the bar have almost doubled in the amount response time from Aug 8th to Aug 9th from 100 minutes in a 2 hour timeframe to 200minutes. Also, you can see that those 2 queries have continued to grow worse over time - on the 11th, there was almost 400 minutes devoted to these 2 queries. Also notice that there are other new queries in the mix. This quickly gets you to a starting point of what to look at in order to find the root cause.

Firefighting – Driving to Root Cause
And that leads us into firefighting mode. Response time analysis is the best and quickest way to find the root cause of the problem. This screen shot shows specific queries running in the database as a specific timeframe. Notice that we are looking between 12 and 12:05pm on Aug 10th. A 5 minute timeslice and we can see the queries that are currently running in this 5 minute timeslice. Each bar represents how much time each query spent in the database for this 5 minutes. The different colors on the bar represent what resource or wait event the query is spending most of it’s time in. Notice that our Sel Customers query has spent most of it’s time waiting on run queue and buffer reads to complete. The second query there is primarily waiting on a semaphore or a lock as is the 5th query – the insertcustomer query. So not only do you know what queries but you can quickly see what is causing the issue. This view of combining queries with the wait events they are encountering is not easily found in ASE today. You will need to collect this information or use some tool like Ignite which shows it. Also,each wait events listed here give clues on how to go about tuning the sql so you can reduce the response time on the wait event. In the case of this locking issue, it may not be a sql tuning effort, instead it may be an application issue or concurrency issue. Notice the ‘blockers’ tab on the right there. By collecting this data, we can quickly get specific information about the blockers and the waiters and the impact that it has on our end users.

Blocking Issue Here is the information that we can view when blocking is occurring. Notice that spid 125 has caused 90 seconds or 1 minute and ½ of wait time for other processes in the database. Spid 164 waited several times, first 22 seconds trying to run some dynamic sql and then it spent 15 seconds waiting to run line 13 of our insertcustomer statement. Notice that we can see that it is waiting for a semaphore and we can see what user, program and machine it’s coming from. At the bottom of the page, you can see what the block spid (125) was running while he cause the blocking. It looks like he was running a stored procedure called createInvoice. From this we can quickly tell why the blocking has occurred because the create Invoice procedure was accessing the customer table while the insertcustomer statement was trying to manipulate the table.

Long Term Issues, Trends & Tuning
Finally with Response time analysis, it’s often useful to look back over time for long term trends. This is an example of looking back in time for about 2 months. You can see the tuning efforts that went on in this database instance. It not only did the purple query (which was our select customers) go from spending hours in the database, other queries were tuned as well. From Jul 31st to Aug 12, the top 15 sql spent anywhere from 25 to 30 hours in the database with all of the processes executing them. After Aug 12th, they are pretty much not existent and those same queries spent only around 10 hours in the database.. Also, notice the 2 queries from Aug 13th to Aug 16th. After tuning them, there is very little time spent in the database. This time actually has been given back to the end-user processes to do other things. Also, by reducing the amount of time the queries spend in the database, we now suddenly freed up more processing power at the database level to add more processes and new programs.

Current View Just as we can look at Long Term trend. Sometimes problems are happening right now. We can use Ignite’s RTA for the last hour up to the last second. We can look sessions real time and see blocking / active sessions and kill them if we need to. Ignite also collect system or resource metrics along with the response time – so you can view the health of the system along with the response time for our end user.. I believe there up to 20 or 30 metrics at you can view and compare… as well as quickly add new ones if you have others that you want to see. Notice the Red X in this current view on cpu utilization. Ignite will alarm if thresholds are met to tell you if there is a critical issue or its in the warning stage. You can click on that more link to get more information about the resource and how often it’s alarming.

Summary Management Data Warehouse (MDW) Data Collection Shortcomings
Can be used for Server/Resource Health Metrics Many pieces/parts to set up & keep running Free ‘first line of defense’ Data Collection System collection sets Disk Usage, Server Activity and Query Statistics. User-defined collection sets – needs development $$$ Shortcomings Large Installations require dedicated performance tools to quickly identify and fix the issue Will want to set up centralized server for multiple instance. Maybe set off on it’s own. Centralized database for data collection – can support collection for several servers. Compacity planning Management of MDW should be as it is with other database Simple recovery default 5-10g per monitored instance Separate data/log files

Confio Software Wait-Based Performance Tools
Ignite8 - SQL Server, Oracle, DB2, Sybase Provides Help With Identifying Biggest Performance Issues Gathering All Details for Immediate Resolution Monitoring Continuously for Normal versus Abnormal Based in Colorado, worldwide customers Free trial at

Performance Management

Similar presentations

Presentation on theme: "Performance Management"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Performance Management

Similar presentations

Presentation on theme: "Performance Management"— Presentation transcript:

Similar presentations

About project

Feedback