Presentation is loading. Please wait.

Presentation is loading. Please wait.

SQL Server Statistics and its relationship with Query Optimizer

Similar presentations


Presentation on theme: "SQL Server Statistics and its relationship with Query Optimizer"— Presentation transcript:

1 SQL Server Statistics and its relationship with Query Optimizer
Musab Umair Malik

2 About me Working with SQL Server since 2007
MCITP SQL Server 2005, MCSA 2012/2014 & MCSE Data Platform Working with S&P Global Market Intelligence since last 4 years Currently holding a Senior DBA position

3 Agenda Definition Purpose
Relationship with Query Optimizer– The Impact Types Some Internals Q & A Session

4 Definition Statistics for query optimization are objects that contain statistical information about the distribution of values in one or more columns of a table or indexed view. Statistics for query optimization are objects that contain statistical information about the distribution of values in one or more columns of a table or indexed view. The query optimizer uses these statistics to estimate the cardinality, or number of rows, in the query result. These cardinality estimates enable the query optimizer to create a high-quality query plan. For example, the query optimizer could use cardinality estimates to choose the index seek operator instead of the more resource-intensive index scan operator, and in doing so improve query performance. Each statistics object is created on a list of one or more table columns and includes a histogram displaying the distribution of values in the first column. Statistics objects on multiple columns also store statistical information about the correlation of values among the columns. These correlation statistics, or densities, are derived from the number of distinct rows of column values. For more information about statistics objects,

5 Purpose The query optimizer uses these statistics to estimate the cardinality, or number of rows, in the query result. These cardinality estimates enable the query optimizer to create a high- quality query plan. For example, the query optimizer could use cardinality estimates to choose the index seek operator instead of the more resource-intensive index scan operator, and in doing so improve query performance. MSDN Query optimizer uses statistics to create query plans that improve query performance. For most queries, the query optimizer already generates the necessary statistics for a high quality query plan; in a few cases, you need to create additional statistics or modify the query design for best results. 

6 Relationship with Query Optimizer– The Impact
Estimated Number of rows vs Actual Number of rows

7 Relationship with Query Optimizer– The Impact
Query Execution plan design depends upon Statistics (Other factors as well) Logical Reads CPU time Execution Plan Join Operator Selection (HASH , MERGE or LOOP)

8 Types Column Statistics Index Statistics User Created Statistics
Filtered Statistics Incremental Statistics

9 Column Statistics Each statistics object is created on a list of one or more table columns and includes a histogram displaying the distribution of values in the first column. Statistics objects on multiple columns also store statistical information about the correlation of values among the columns. These correlation statistics, or densities, are derived from the number of distinct rows of column values.

10 Index Statistics The query optimizer creates statistics for indexes on tables or views when the index is created. These statistics are created on the key columns of the index. If the index is a filtered index, the query optimizer creates filtered statistics on the same subset of rows specified for the filtered index.

11 User Statistics For most queries, these two methods for creating statistics ensure a high-quality query plan; in a few cases, you can improve query plans by creating additional statistics with the CREATE STATISTICS statement. These additional statistics can capture statistical correlations that the query optimizer does not account for when it creates statistics for indexes or single columns. Your application might have additional statistical correlations in the table data that, if calculated into a statistics object, could enable the query optimizer to improve query plans. For example, filtered statistics on a subset of data rows or multicolumn statistics on query predicate columns might improve the query plan.

12 Filtered Statistics Filtered statistics can improve query performance for queries that select from well-defined subsets of data. Filtered statistics use a filter predicate to select the subset of data that is included in the statistics. Well-designed filtered statistics can improve the query execution plan compared with full-table statistics.

13 Incremental Statistics
A major problem with updating statistics in large tables in SQL Server is that the entire table always has to be scanned, for example when using the WITH FULLSCAN option, even if only recent data has changed. This is also true when using partitioning: even if only the newest partition had changed since the last time statistics were updated, updating statistics again required to scan the entire table including all the partitions that didn’t change. Incremental statistics, a new SQL Server 2014 feature, can help with this problem. Using incremental statistics you can update only the partition or partitions that you need and the information on these partitions will be merged with the existing information to create the final statistics object. Another advantage of incremental statistics is that the percentage of data changes required to trigger an automatic update of statistics now works at the partition level which basically means that now only 20% of rows changed (changes on the leading statistics column) per partition are required.

14 Some Internals What works automatically? (500 rows 20%)
AUTO_CREATE_STATISTICS AUTO_UPDATE_STATISTICS AUTO_UPDATE_STATISTICS_ASYNC What does not work by default. Trace Flag 2371 What to do when automatic statistics does not work ? Temp Tables – subject to statistics auto/re-compile Table variable – no statistics, assumes 1 row

15 Some Internals (continued)
AUTO_CREATE_STATISTICS Option When the automatic create statistics option, AUTO_CREATE_STATISTICS, is on, the query optimizer creates statistics on individual columns in the query predicate, as necessary, to improve cardinality estimates for the query plan. These single-column statistics are created on columns that do not already have a histogram in an existing statistics object. The AUTO_CREATE_STATISTICS option does not determine whether statistics get created for indexes. This option also does not generate filtered statistics. It applies strictly to single-column statistics for the full table. When the query optimizer creates statistics as a result of using the AUTO_CREATE_STATISTICS option, the statistics name starts with _WA. You can use the following query to determine if the query optimizer has created statistics for a query predicate column.

16 Some Internals (continued)
AUTO_UPDATE_STATISTICS Option When the automatic update statistics option, AUTO_UPDATE_STATISTICS, is on, the query optimizer determines when statistics might be out-of-date and then updates them when they are used by a query. Statistics become out-of-date after insert, update, delete, or merge operations change the data distribution in the table or indexed view. The query optimizer determines when statistics might be out-of-date by counting the number of data modifications since the last statistics update and comparing the number of modifications to a threshold. The threshold is based on the number of rows in the table or indexed view.

17 Some Internals (continued)
AUTO_UPDATE_STATISTICS_ASYNC The asynchronous statistics update option, AUTO_UPDATE_STATISTICS_ASYNC, determines whether the query optimizer uses synchronous or asynchronous statistics updates. By default, the asynchronous statistics update option is off, and the query optimizer updates statistics synchronously. The AUTO_UPDATE_STATISTICS_ASYNC option applies to statistics objects created for indexes, single columns in query predicates, and statistics created with the CREATE STATISTICS statement.

18 Conclusion Maintaining Statistics is a complicated task with a huge impact on server/query performance Understanding the importance is the core A great way to increase performance with variety of options

19 THANK YOU ! Questions ?


Download ppt "SQL Server Statistics and its relationship with Query Optimizer"

Similar presentations


Ads by Google