#SQLSatRiyadh Special Topics Joe Chang www.qdpma.com.

Slides:



Advertisements
Similar presentations
Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Advertisements

SQL Server performance tuning basics
Modern Performance - SQL Server Joe Chang yahoo.
Modern Performance - SQL Server
Statistics That Need Special Attention Joe Chang yahoo
SQL Performance 2011/12 Joe Chang, SolidQ
Automating Performance … Joe Chang SolidQ
Dos and don’ts of Columnstore indexes The basis of xVelocity in-memory technology What’s it all about The compression methods (RLE / Dictionary encoding)
Comprehensive Performance with Automated Execution Plan Analysis (ExecStats) Joe Chang yahoo
Modern Performance - SQL Server Joe Chang & SolidQ.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
David Konopnicki Choosing Access Path ä The basic methods. ä The access paths and when they are available. ä How the optimizer chooses among the.
Virtual techdays INDIA │ 9-11 February 2011 SQL 2008 Query Tuning Praveen Srivatsa │ Principal SME – StudyDesk91 │ Director, AsthraSoft Consulting │ Microsoft.
SQL Server Query Optimizer Cost Formulas Joe Chang
Parallel Execution Plans Joe Chang
SQL Server 2005 Performance Enhancements for Large Queries Joe Chang
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Access Path Selection in a Relational Database Management System Selinger et al.
Denny Cherry Manager of Information Systems MVP, MCSA, MCDBA, MCTS, MCITP.
Comprehensive Indexing via Automated Execution Plan Analysis (ExecStats) Joe Chang yahoo Slide deck here.
Module 7 Reading SQL Server® 2008 R2 Execution Plans.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
Primary Key, Cluster Key & Identity Loop, Hash & Merge Joins Joe Chang
Parallel Execution Plans Joe Chang
Large Data Operations Joe Chang
Parallel Execution Plans Joe Chang
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
TPC-H Studies Joe Chang
Multi-Way Hash Join Effectiveness M.Sc Thesis Michael Henderson Supervisor Dr. Ramon Lawrence 2.
Query Optimizer Execution Plan Cost Model Joe Chang
Buffer-pool aware Query Optimization Ravishankar Ramamurthy David DeWitt University of Wisconsin, Madison.
Session 1 Module 1: Introduction to Data Integrity
Stored Procedure Optimization Preventing SP Time Out Delay Deadlocking More DiskReads By: Nix.
DAT410 SQL Server 2005 Optimizing Procedural Code Kimberly L. Tripp President/Founder, SQLskills.com.
1 Chapter 9 Tuning Table Access. 2 Overview Improve performance of access to single table Explain access methods – Full Table Scan – Index – Partition-level.
Query Processing – Implementing Set Operations and Joins Chap. 19.
Query Optimization Cases. D. ChristozovINF 280 DB Systems Query Optimization: Cases 2 Executable Block 1 Algorithm using Indices (if available) Temporary.
8 Copyright © 2005, Oracle. All rights reserved. Gathering Statistics.
Dave LinkedIn
How to kill SQL Server Performance Håkan Winther.
SQL Server Statistics DEMO SQL Server Statistics SREENI JULAKANTI,MCTS.MCITP,MCP. SQL SERVER Database Administration.
Execution Plans Detail From Zero to Hero İsmail Adar.
SQL Server Statistics DEMO SQL Server Statistics SREENI JULAKANTI,MCTS.MCITP SQL SERVER Database Administration.
Dynamic SQL Writing Efficient Queries on the Fly ED POLLACK AUTOTASK CORPORATION DATABASE OPTIMIZATION ENGINEER.
Diving into Query Execution Plans ED POLLACK AUTOTASK CORPORATION DATABASE OPTIMIZATION ENGINEER.
Session Name Pelin ATICI SQL Premier Field Engineer.
SQL Server Statistics and its relationship with Query Optimizer
Modern Performance - SQL Server
Parameter Sniffing in SQL Server Stored Procedures
Tuning Transact-SQL Queries
Query Optimization Techniques
Execution Planning for Success
Stored Procedures – Facts and Myths
Query-by-Example (QBE)
UFC #1433 In-Memory tables 2014 vs 2016
Query Tuning without Production Data
Joe Chang yahoo . com qdpma.com
Introduction to Execution Plans
The Key to the Database Engine
Query Optimization Techniques
SQL Server Query Optimizer Cost Formulas
Introduction to Execution Plans
“Magic numbers”, local variable and performance
SQL Server Query Design and Optimization Recommendations
Introduction to Execution Plans
Query Optimization Techniques
Reading execution plans successfully
Introduction to Execution Plans
Presentation transcript:

#SQLSatRiyadh Special Topics Joe Chang

About Joe SQL Server consultant since 1999 Query Optimizer execution plan cost formulas (2002) True cost structure of SQL plan operations (2003?) Database with distribution statistics only, no data 2004 Decoding statblob/stats_stream – writing your own statistics Disk IO cost structure Tools for system monitoring, execution plan analysis See ExecStats on

Overview Why performance is still important today? – Brute force? Yes, but … Special Topics Automating data collections SQL Server Engine – What developers/DBA need to know?

CPU & Memory 2001 and – 4 sockets, 4 cores Pentium III Xeon, 900MHz 4-8GB memory? Xeon MP – 4 sockets, 8 cores each 4 x 8 = 32 cores total 768GB (48 x 16GB), Westmere-EX 1TB 15 cores in next generation? FSB P L2 PP P MCH QPI DMI 2 PCI-E QPI PCI-E MI PCI-E C1C6 C2C5 C3C4 LLC QPI MI C7C0 MI PCI-E C1C6 C2C5 C3C4 LLC QPI MI C7C0 MI PCI-E C1C6 C2C5 C3C4 LLC QPI MI C7C0 MI PCI-E C1C6 C2C5 C3C4 LLC QPI MI C7C0 Each core today is more than 10x over PIII 16GB $191 32GB $794

Storage 2001 versus 2012/13 PCIe x8 PCIe x4 IBRAID 10GbE QPI 192 GB HDD SSD x 10K HDD 125 IOPS each = 12.5K IOPS IO Bandwidth limited: 1.3GB/s (1/3 memory bandwidth) SSDs, >10K+ IOPS each, 1M IOPS possible IO Bandwidth 10GB/s easy SAN vendors – questionable BW

Performance Past, Present, Future When will servers be so powerful that … – Been saying this for a long time Today – 10 to 100X overkill – 32-cores, 60-cores later in 2013? – Enough memory that IO is only sporadic – Unlimited IOPS with SSD What can go wrong? Today’s topic

Special Topics Data type mistmatch Multiple Optional Search Arguments (SARG) – Function on SARG Parameter Sniffing versus Variables Statistics related (big topic) first OR, then AND/OR combinations Complex Query with sub-expressions Parallel Execution Not in order of priority

1a. Data type mismatch nvarchar(25) = N'Customer# ' SELECT * FROM CUSTOMER WHERE C_NAME SELECT * FROM CUSTOMER WHERE C_NAME = auto-parameter discovery? Unable to use index seek

1b. Type Mismatch – Row Estimate SELECT * FROM CUSTOMER WHERE C_NAME LIKE N'Customer# %' SELECT * FROM CUSTOMER WHERE C_NAME LIKE 'Customer# %' Row estimate error could have severe consequences in a complex query

SELECT TOP plus Row Estimate Error SELECT TOP 1000 [Document].[ArtifactID] FROM [Document] (NOLOCK) WHERE [Document].[AccessControlListID_D] IN (1, , ) AND EXISTS ( SELECT [DocumentBatch].[BatchArtifactID] FROM [DocumentBatch] (NOLOCK) INNER JOIN [Batch] (NOLOCK) ON [Batch].ArtifactID = [DocumentBatch].[BatchArtifactID] WHERE [DocumentBatch].[DocumentArtifactID] = [Document].[ArtifactID] AND [Batch].[Name] LIKE N'%Value%' ) ORDER BY [Document].[ArtifactID] Data type mismatch – results in estimate rows high Top clause – easy to find first 1000 rows In fact, there are few rows that match SARG Wrong plan for evaluating large number of rows

2. Multiple Optional SARG int = 1 SELECT * FROM LINEITEM WHERE IS NULL OR L_ORDERKEY AND IS NULL OR L_PARTKEY AND IS NOT NULL IS NOT NULL)

Dynamically Built Parameterized SQL int = nvarchar(100) = N‘/* Comment */ SELECT * FROM LINEITEM WHERE = int' IF IS NOT NULL) + N' AND L_ORDERKEY IF IS NOT NULL) + N' AND L_PARTKEY IF block is easier for few options Dynamically built parameterized SQL better for many options Considering /*comment*/ to help identify this

IF block int = 1 IF IS NOT NULL) SELECT * FROM LINEITEM WHERE (L_ORDERKEY AND IS NULL OR L_PARTKEY ELSE IF IS NOT NULL) SELECT * FROM LINEITEM WHERE (L_PARTKEY

2b. Function on column SARG SELECT COUNT(*), SUM(L_EXTENDEDPRICE) FROM LINEITEM WHERE YEAR(L_SHIPDATE) = 1995 AND MONTH(L_SHIPDATE) = 1 SELECT COUNT(*), SUM(L_EXTENDEDPRICE) FROM LINEITEM WHERE L_SHIPDATE BETWEEN ' ' AND ' ' int = 1 SELECT COUNT(*), SUM(L_EXTENDEDPRICE) FROM LINEITEM WHERE L_SHIPDATE AND

Estimated versus Actual Plan - rows Estimated Plan – 1 row??? Actual Plan – actual rows 77,356

3 Parameter Sniffing -- first call, procedure compiles with these parameters exec = = ' ' -- subsequent calls, procedure executes with original plan exec = = ' ' Need different execution plans for narrow and wide range Options: 1) WITH RECOMPILE 2) main procedure calls 1 of 2 identical sub-procedures One sub-procedure is only called for narrow range Other called for wide range

4 Statistics Auto-recompute points Sampling strategy – Percentage – Random pages versus random rows – Histogram Equal and Range Rows – Out of bounds, value does not exist – etc Statistics Used by the Query Optimizer in SQL Server 2008 Writer: Eric N. Hanson and Yavor Angelov Contributor: Lubor Kollar

Statistics Structure Stored (mostly) in binary field Scalar values Density Vector Histogram

Statistics Auto/Re-Compute Automatically generated on query compile Recompute at 6 rows, 500, every 20%? Has this changed?

Statistics Sampling Sampling theory – True random sample – Sample error - square root N Relative error 1/ N SQL Server sampling – All rows in random pages

Row Estimate - Statistics Skewed data distribution Out of bounds Value does not exist

Loop Join - Table Scan on Inner Source Estimated out from first 2 tabes (at right) is zero or 1 rows. Most efficient join to third table (without index on join column) is a loop join with scan. If row count is 2 or more, then a fullscan is performed for each row from outer source Default statistics rules may lead to serious ETL issues Consider custom strategy

Compile Parameter Not Exists Main procedure has cursor around view_Servers First server in view_Servers is ’CAESIUM’ Cursor executes sub-procedure for each Server sql: SELECT MAX(ID) FROM TReplWS WHERE Hostname But CAESIUM does not exist in TReplWS!

Good and Bad Plan?

SqlPlan Compile Parameters

<StmtSimple varchar(50) = ISNULL(MAX(id),0) FROM TReplWS WHERE Hostname StatementId="1" StatementCompId="43" StatementType="SELECT" StatementSubTreeCost=" " StatementEstRows="1" StatementOptmLevel="FULL" QueryHash="0x671D2B3E17E538F1" QueryPlanHash="0xEB64FB22C47E1CF2" StatementOptmEarlyAbortReason="GoodEnoughPlanFound"> <StatementSetOptions QUOTED_IDENTIFIER="true" ARITHABORT="false" CONCAT_NULL_YIELDS_NULL="true" ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" NUMERIC_ROUNDABORT="false" /> <RelOp NodeId="0" PhysicalOp="Compute Scalar" LogicalOp="Compute Scalar" EstimateRows="1" EstimateIO="0" EstimateCPU="1e-007“ AvgRowSize="15" EstimatedTotalSubtreeCost=" " Parallel="0" EstimateRebinds="0" EstimateRewinds="0"> Compile parameter values at bottom of sqlplan file

5a Single Table OR -- Single table SELECT * FROM LINEITEM WHERE L_ORDERKEY = 1 OR L_PARTKEY =

5a Join 2 Tables, OR in SARG -- subsequent calls, procedure executes with original plan SELECT O_ORDERDATE, O_ORDERKEY, L_SHIPDATE, L_QUANTITY FROM LINEITEM INNER JOIN ORDERS ON O_ORDERKEY = L_ORDERKEY WHERE L_PARTKEY = OR O_CUSTKEY =

5a UNION instead of OR SELECT O_ORDERDATE, O_ORDERKEY, L_SHIPDATE, L_QUANTITY, O_CUSTKEY, L_PARTKEY FROM LINEITEM INNER JOIN ORDERS ON O_ORDERKEY = L_ORDERKEY WHERE L_PARTKEY = UNION SELECT O_ORDERDATE, O_ORDERKEY, L_SHIPDATE, L_QUANTITY, O_CUSTKEY, L_PARTKEY FROM LINEITEM INNER JOIN ORDERS ON O_ORDERKEY = L_ORDERKEY WHERE O_CUSTKEY = Caution, select list should have keys to ensure correct rows UNION removes duplicates

5b AND/OR Combinations Hash Join is good method to process many rows – Requirement is equality join condition In complex SQL with AND/OR or IN NOT IN combinations – Query optimizer may not be to determine that equality join condition exists – Execution plan will use loop join, – and attempt to force hash join will be rejected Re-write using UNION in place of OR And LEFT JOIN in place of NOT IN SELECT xx FROM A WHERE col1 IN (expr1) AND col2 NOT IN (expr2) SELECT xx FROM A WHERE (expr1) AND (expr2 OR expr3) More on AND/OR combinations:

Complex Query with Sub-expression Query complexity – really high compile cost Repeating sub-expressions (including CTE) – Must be evaluated multiple times Main Problem – Row estimate error propagation Solution – Temp table when estimate is high, actual is low More on AND/OR combinations:

Parallelism Designed for 1998 era – Cost Threshold for Parallelism: default 5 – Max Degree of Parallelism – instance level – OPTION (MAXDOP n) – query level Today – complex system – 32 cores – Plan cost 5 query might run in 10ms? – Some queries at DOP 4 – Others at DOP 16? More on Parallelism:

Full-Text Search Loop Join with FT as inner Source Full Text search Potentially executed many times

Summary Hardware today is really powerful – Storage may not be – SAN vendor disconnect Look for serious blunders first

Special Topics Data type mistmatch Multiple Optional Search Arguments (SARG) – Function on SARG Parameter Sniffing versus Variables Statistics related (big topic) AND/OR Complex Query with sub-expressions Parallel Execution

Parallelism