Shu J Scott Program Manager for Query Processing SQL Server Relational Engine, Microsoft DAT318.

Shu J Scott Program Manager for Query Processing SQL Server Relational Engine, Microsoft DAT318

Reduce the cost of managing database applications Understand SQL server Query Optimizer specifics Help the Optimizer produce quality query plans Avoid problems through design/implementation best practices

Make app run fast independent of the platform SQL Server specific techniques SQL Server Query Compilation Overview Necessary conditions for quality query plans Importance of good stats Importance of indexes Influencing Optimizer behavior with query hints Other things to help the Optimizer Q&A

Good logical DB design Good physical DB design (indexes) Proper hardware (memory, I/O, processing power) Minimize # of round-trip messages between application and DB server Use set-oriented power of query processor Etc.

Make app run fast independent of the platform SQL Server specific techniques SQL Server Query Compilation Overview Necessary conditions for quality query plans Statistics Indexes Importance of good stats Importance of indexes Influence Optimizer behavior with query hints Q&A

SQL Server collects statistics about individual columns or sets of columns QO is cost-based, uses statistics to evaluate the cost of plans Good Stats  Good Estimates  Good Plans “no stats” causes QO to guess Can give bad plans To get detailed info on stats Sys.stats sp_helpstats DBCC SHOW_STATISTICS Other commands – see Books Online

Make app run fast independent of the platform SQL Server specific techniques SQL Server Query Compilation Overview Necessary conditions for quality query plans Importance of good stats Statistical data collected by SQL Server Good Stats  quality query plan Best practices for managing stats Importance of indexes Influencing Optimizer behavior Q&A

Use auto create and auto update statistics Use FULLSCAN statistics if needed Avoid use of local variables in queries Avoid updating SP parameters prior to use in a query Avoid using a function with column input in a predicate Limit use of multi-statement TVFs and table variables Consider using multi-column statistics More frequent stats gathering for ascending keys Use asynchronous statistics update if needed

On by default “no stats” causes bad plans it is right much more often that it is wrong If you must turn it off, don’t turn it off for whole DB Selectively disable it for certain columns or tables Watch out for read-only DB preventing auto- create/update of stats

Autostats limitations: 20% change threshold for autoupdate When queries mostly touch new data, and data change < 20%, histograms may not contain useful information for data distribution Autoupdate stats *samples* even if stats were created with FULLSCAN (e.g. during index create) Fixed sampling rate for a given table size can be too low if data is not randomly distributed E.g. if data was sorted when loaded Auto created stats is on single column only

Fullscan gives best-quality histograms Hard to determine what % is ideal, so just use FULLSCAN to eliminate sampling rate as a potential problem You can usually afford the time to do FULLSCAN during your nightly batch window FULLSCAN stats are gathered in parallel

Local variables lead to selectivity “guesses” by optimizer Bad: DECLARE @StartOrderDate datetime SET @StartOrderDate = '20040731' SELECT * FROM Sales.SalesOrderHeader h, Sales.SalesOrderDetail d WHERE h.SalesOrderID = d.SalesOrderId AND h.OrderDate >= @StartOrderDate Good: SELECT * FROM Sales.SalesOrderHeader h, Sales.SalesOrderDetail d WHERE h.SalesOrderID = d.SalesOrderId AND h.OrderDate >= '20040731'

Stored proc and all queries in it are compiled with the param values 1 st passed in – “Parameter Sniffing” Generally good: better estimate  better plan Bad if param values are changed prior to use If @date IS NULL set @date='20050611' select … where … and t.date > @date Better If @date IS NULL set @date='20050611' Exec helper_proc @date

Bad: SELECT * FROM Sales.SalesOrderHeader WHERE DATEPART (yyyy, OrderDate) = 2003 AND DATEPART (mm, OrderDate ) = 5 AND DATEPART (dd, OrderDate ) = 18 Results in selectivity estimate guess Better: SELECT * FROM Sales.SalesOrderHeader WHERE OrderDate = '20030518'

Multi-statement TVFs and table variables have no stats All estimates for them are based on guesses Use a real table or temp table instead if you need stats to get a good plan

Example: SELECT * FROM UserLog WHERE UserName = suser_sname() Results in selectivity estimate guess Better: SET @s = 'SELECT * FROM UserLog WHERE UserName = ''' + suser_sname() + ''' EXEC @s

Condition FirstName = 'Catherine‘ AND LastName = 'Abel‘ Autostats never creates multi-column statistics Selectivity using 2 single column stats may underestimate Useful to create multi-column statistics: CREATE STATISTICS LastFirst ON Person.Contact(LastName,FirstName)

Table with ascending key columns Frequent inserts, but not enough to cause auto stats update yet New key column values outside the histogram  inaccurate stats A solution: Explicitly update stats (using default sampling rate) A) every time you change size of table by, say, 1%, or B) periodically (e.g. hourly, daily)

Auto update of statistics can cause a noticeable pause in an OLTP app Solution: asynch stats update To enable: ALTER DATABASE dbname SET AUTO_UPDATE_STATISTICS_ASYNC ON Query compilation with out-of-date statistics causes auto update to be done in background Compilation does not pause, but proceeds with old statistics

Statistics in SQL Server Best Practices for managing statistics Intelligently trade off compile time vs. query plan quality Minimize compile time for OLTP apps by parameterizing queries Do not parameterize ad-hoc or long-running queries (so you get best plan) Watch for “parameter sniffing”-related problems

SQL Server Query Compilation Overview Necessary conditions for quality query plans Importance of good stats Importance of indexes General guideline How SQL Server can help Influencing Optimizer behavior Other things to help the Optimizer Q&A

Use well-known techniques Clustered indexes to support range scans Non-clustered indexes to support unique key lookups Focus on predicates in frequent queries to drive index selection If you need a little more help from SQL Server, use: Database Tuning Advisor (DTA) Missing Indexes feature (new in SQL Server 2005)

SELECT CustomerID, SalesOrderNumber, SubTotal FROM Sales.SalesOrderHeader WHERE ShipMethodID > 2 AND SubTotal > 500.00 AND Freight < 15.00 AND TerritoryID = 5; SELECT * FROM sys.dm_db_missing_index_details Recommends: CREATE NONCLUSTERED INDEX IX_SalesOrderHeader_TerritoryID ON Sales.SalesOrderHeader (TerritoryID, ShipMethodID, SubTotal, Freight) INCLUDE (SalesOrderNumber, CustomerID);

SQL Server Query Compilation Overview Necessary conditions for quality query plans Importance of good stats Importance of indexes Influencing Optimizer behavior Optimize for Plan guide Use Plan Q & A

If data distribution is uneven & Parameter sniffing  cached plan: good sometimes, bad other times There is a specific (known-good) parameter value, the plan is a good enough for almost all cases Want to always compile and use this “good enough” plan  SELECT * FROM … WHERE … OPTION(OPTIMIZE FOR (@city = ‘NY’))

Can’t change the application Add a hint leads to a better plan Use a plan guide sp_create_plan_guide

Can’t change the application but need to use the known good plan Capture an XML showplan Force QO to use it later with USE PLAN N’xml plan’ query hint

SQL Server Query Compilation Overview Necessary conditions for quality query plans Importance of good stats Importance of indexes Influencing Optimizer behavior Other things to help the Optimizer Specify constraints to help the optimizer Rewrite the Query and Reap the Rewards Q & A

Unique Constraints (or unique indexes or primary key constraints) Foreign Key Constraints NOT NULL constraints If your data obeys these, declare them! Exception: if maintaining them is too expensive, don’t do it

Drop group by Drop outer joins Reduce sort columns Remove asserts in scalar-valued subqueries

Foreign Key constraints Drop inner joins (requires non-nullable keys) Better upper bound on join cardinality estimate NOT NULL constraints Simplify some aggregate functions Simpler predicates for universal quantification like ‘>all’ etc. Helps with dropping inner joins (see above)

use AdventureWorks go select count(*) from Sales.SalesOrderHeader h join Sales.SalesOrderDetail d on h.SalesOrderID = d.SalesOrderID

Notice: no join in plan Plan scans SalesOrderDetail Plan doesn’t touch SalesOrderHeader Why? FK constraint on SalesOrderDetail.SalesOrderID, referencing SalesOrderHeader.SalesOrderID implies join does not eliminate any rows of SaleOrderDetail SalesOrderDetail

select h.SalesOrderID, count(*) from Sales.SalesOrderHeader h join Sales.SalesOrderDetail d on h.SalesOrderID = d.SalesOrderID group by h.SalesOrderID Unique & FK constraints in place

Aggregate Pushed Down SQL Server Execution Times: CPU time = 110 ms, elapsed time = 336 ms CPU time = 110 ms, elapsed time = 336 ms SalesOrderHeader

h: SalesOrderHeader with no constraints, non- unique clustered index on SalesOrderID d: SalesOrderDetail with no constraints, non-unique clustered index on (SalesOrderID,SalesOrderDetailID) select h.SalesOrderID, count(*) from o as h join d as d on h.SalesOrderID = d.SalesOrderID group by h.SalesOrderID

Aggregate Not Pushed Down Why? QO doesn’t know join doesn’t filter out any rows of d SQL Server Execution Times: CPU time = 280 ms, elapsed time = 643 ms

The QO can do many query rewrites But only you can know all the meaning of your data So you can do query rewrites that SQL Server could never identify Some examples: Eliminate DISTINCT If you know your SELECT list contains a unique column, then you can remove DISTINCT Add implied predicate Model = ‘F150’  Make = ‘Ford’

Select DISTINCT Make, Model, Year From Vehicle Suppose in your Vehicle table, Make,Model,Year are a key (though not enforced by a constraint) Can be rewritten as : Select Make, Model, Year From Vehicle Eliminates a sort or hash distinct operation

Original query: Select * From T Where Model = ‘F150’ Add implied predicate Model = ‘F150’  Make = ‘Ford’ Rewritten query: Select * From T Where Model = ‘F150’ And Make = ‘Ford’ If index on Make but not Model, can give big speedup

Consider manually running a subquery to allow SQL Server to get perfect cardinality estimate when columns are correlated Ex. Original query Select …, sum(Fact.measure) From Fact, Dim1, Dim2, … Dim8 Where … join all tables … And Dim1.col1 = ‘a’ And Dim1.col2=‘b’ And Dim1.col3 = ‘c’ And … more dimension filters … Group by … Suppose STATISTICS PROFILE shows estimate for qualifying Dim1 rows is 1, actual is 10000  bad plan

select * into dim1_temp From Dim1 Where Dim1.col1 = ‘a’ And Dim1.col2=‘b’ And Dim1.col3 = ‘c’ Select …, sum(Fact.measure) From Fact, Dim1_temp, Dim2, … Dim8 Where … join all tables … And … more dimension filters … Group by … perfect cardinality for Dim1_temp  good plan

General good practices Good logical DB design, indexes, hardware, etc. Minimize # of round-trip messages between application and DB server SQL Server-specific Good stats Good indexes Influence Optimizer behavior if absolutely necessary Use query rewrites when you know something SQL Server doesn’t

Database Management Systems Raghu Ramakrishnan, Johannes Gehrke, 3 rd edition, Ch. 20, Physical Database Design and Tuning. Statistics Used by the Query Optimizer in Microsoft SQL Server 2005, http://www.microsoft.com/technet/prodtechn ol/sql/2005/qrystats.mspx http://www.microsoft.com/technet/prodtechn ol/sql/2005/qrystats.mspx

© 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Shu J Scott Program Manager for Query Processing SQL Server Relational Engine, Microsoft DAT318.

Similar presentations

Presentation on theme: "Shu J Scott Program Manager for Query Processing SQL Server Relational Engine, Microsoft DAT318."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Shu J Scott Program Manager for Query Processing SQL Server Relational Engine, Microsoft DAT318.

Similar presentations

Presentation on theme: "Shu J Scott Program Manager for Query Processing SQL Server Relational Engine, Microsoft DAT318."— Presentation transcript:

Similar presentations

About project

Feedback