Modern Performance - SQL Server Joe Chang yahoo.

Slides:



Advertisements
Similar presentations
SQL Server performance tuning basics
Advertisements

new database engine component fully integrated into SQL Server 2014 optimized for OLTP workloads accessing memory resident data achive improvements.
Aaron Bertrand SQL Sentry, Senior Kevin Kline SQL Sentry, Dir of Engineering
Computer Organization and Architecture
CSCI 4717/5717 Computer Architecture
Exadata Distinctives Brown Bag New features for tuning Oracle database applications.
#SQLSatRiyadh Storage Performance 2013 Joe Chang
Modern Performance - SQL Server
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
Performance of Cache Memory
Intel ® Xeon ® Processor E v2 Product Family Ivy Bridge Improvements *Other names and brands may be claimed as the property of others. FeatureXeon.
Statistics That Need Special Attention Joe Chang yahoo
SQL Performance 2011/12 Joe Chang, SolidQ
Automating Performance … Joe Chang SolidQ
Comprehensive Performance with Automated Execution Plan Analysis (ExecStats) Joe Chang yahoo
Modern Performance - SQL Server Joe Chang & SolidQ.
Virtual techdays INDIA │ 9-11 February 2011 SQL 2008 Query Tuning Praveen Srivatsa │ Principal SME – StudyDesk91 │ Director, AsthraSoft Consulting │ Microsoft.
EECS 470 Superscalar Architectures and the Pentium 4 Lecture 12.
SQL Server Query Optimizer Cost Formulas Joe Chang
Presenter MaxAcademy Lecture Series – V1.0, September 2011 Introduction and Motivation.
Parallel Execution Plans Joe Chang
#SQLSatRiyadh Special Topics Joe Chang
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Comprehensive Indexing via Automated Execution Plan Analysis (ExecStats) Joe Chang yahoo Slide deck here.
Module 7 Reading SQL Server® 2008 R2 Execution Plans.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
© Stavros Harizopoulos 2006 Performance Tradeoffs in Read- Optimized Databases: from a Data Layout Perspective Stavros Harizopoulos MIT CSAIL Modified.
Insert, Update & Delete Performance Joe Chang
Super computers Parallel Processing By Lecturer: Aisha Dawood.
Parallel Execution Plans Joe Chang
Large Data Operations Joe Chang
+ CS 325: CS Hardware and Software Organization and Architecture Memory Organization.
Parallel Execution Plans Joe Chang
TPC-H Studies Joe Chang
SQL Server Scaling on Big Iron (NUMA) Systems Joe Chang TPC-H.
Query Optimizer Execution Plan Cost Model Joe Chang
Next Generation ISA Itanium / IA-64. Operating Environments IA-32 Protected Mode/Real Mode/Virtual Mode - if supported by the OS IA-64 Instruction Set.
MISSION CRITICAL COMPUTING Siebel Database Considerations.
IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida Physical Database Implementation—Topics.
8 Copyright © 2005, Oracle. All rights reserved. Gathering Statistics.
How to kill SQL Server Performance Håkan Winther.
SQL Server Statistics DEMO SQL Server Statistics SREENI JULAKANTI,MCTS.MCITP,MCP. SQL SERVER Database Administration.
SQL Server Statistics DEMO SQL Server Statistics SREENI JULAKANTI,MCTS.MCITP SQL SERVER Database Administration.
SAP Tuning 실무 SK㈜ ERP TFT.
DESIGNING HIGH PERFORMANCE ETL FOR DATA WAREHOUSE. Best Practices and approaches. Alexei Khalyako (SQLCAT) & Marcel Franke (pmOne)
SQL Server Statistics and its relationship with Query Optimizer
Modern Performance - SQL Server
Parameter Sniffing in SQL Server Stored Procedures
Tuning Transact-SQL Queries
Query Optimization Techniques
Execution Planning for Success
Stored Procedures – Facts and Myths
Query Tuning without Production Data
Query Tuning without Production Data
Query Tuning without Production Data
Reading execution plans successfully
Joe Chang yahoo . com qdpma.com
Building Modern Transaction Systems on SQL Server
Software Architecture in Practice
Upgrading to Microsoft SQL Server 2014
Introduction to Execution Plans
JULIE McLAIN-HARPER LINKEDIN: JM HARPER
Statistics: What are they and How do I use them
Reading Execution Plans Successfully
Shaving of Microseconds
SQL Server Query Optimizer Cost Formulas
Introduction to Execution Plans
Introduction to Execution Plans
Reading execution plans successfully
Introduction to Execution Plans
Presentation transcript:

Modern Performance - SQL Server Joe Chang yahoo

About Joe SQL Server consultant since 1999 Query Optimizer execution plan cost formulas (2002) True cost structure of SQL plan operations (2003?) Database with distribution statistics only, no data 2004 Decoding statblob/stats_stream – writing your own statistics Disk IO cost structure Tools for system monitoring, execution plan analysis See Download: Blog:

Overview General SQL Server Performance Why performance is still important today? – Brute force? Yes, but … Special Topics – spectacular fails Automating data collections SQL Server Engine – What developers/DBA need to know?

Not in this session List of rules to be followed blindly without consideration for the underlying reason and whether rule actually applies in the current circumstance DBA skill: cause and effect analysis & assessment

Common Themes? execution plan – Very large (multiple order of magnitude) error in row estimate Single (execute) of large operation – Might still be tolerable Multiple (executes) of large operations

select a.Header, a.CUSIP, a.SecNo, a.Security, a.Symbol,a.Split_rep, a.Sales_Person_Name,cast(sum(a.January) as float) as January,cast(sum(a.February) as float) as February,cast(sum(a.March) as float) as March,cast(sum(a.April) as float) as April,cast(sum(a.May) as float) as May,cast(sum(a.June) as float) as June,cast(sum(a.July) as float) as July,cast(sum(a.August) as float) as August,cast(sum(a.September) as float) as September,cast(sum(a.October) as float) as October,cast(sum(a.November) as float) as November,cast(sum(a.December) as float) as December,cast(sum(a.Total) as float) as Total from( select cast(hdr.Header as varchar(100)) as Header,cast(AcctSec.CUSIP as varchar(100)) as CUSIP,cast(AcctSec.Sec_No as varchar(100)) as SecNo,cast(AcctSec.Sec_Desc1 as varchar(100)) as Security,cast(AcctSec.Symbol as varchar(100)) as Symbol,case when RefMonth.[MonthName] = 'January' then fct.Comm else 0 end as January,case when RefMonth.[MonthName] = 'February' then fct.Comm else 0 end as February,case when RefMonth.[MonthName] = 'March' then fct.Comm else 0 end as March,case when RefMonth.[MonthName] = 'April' then fct.Comm else 0 end as April,case when RefMonth.[MonthName] = 'May' then fct.Comm else 0 end as May,case when RefMonth.[MonthName] = 'June' then fct.Comm else 0 end as June,case when RefMonth.[MonthName] = 'July' then fct.Comm else 0 end as July,case when RefMonth.[MonthName] = 'August' then fct.Comm else 0 end as August,case when RefMonth.[MonthName] = 'September' then fct.Comm else 0 end as September,case when RefMonth.[MonthName] = 'October' then fct.Comm else 0 end as October,case when RefMonth.[MonthName] = 'November' then fct.Comm else 0 end as November,case when RefMonth.[MonthName] = 'December' then fct.Comm else 0 end as December,fct.Comm as Total,AcctEmp.split_rep,AcctEmp.Sales_Person_Name from PayoutSystemDW.[dbo].[PS_FactAccountSummary] fct join PayoutSystemDW.dbo.PS_DimensionRptBus RptBus on fct.DimRptBusID = RptBus.DimRptBusID join PayoutSystemDW.dbo.PS_DimensionHeader hdr on fct.DimHeaderID = hdr.DimHeaderID join PayoutSystemDW.dbo.PS_DimensionCurrency cur on fct.DimCurID = cur.DimCurID and cur.DimCurID = 1 join PayoutSystemDW.dbo.PS_DimensionAcctEmp AcctEmp on fct.DimAcctEmpID = acctemp.DimAcctEmpID and AcctEmp.Empno = 8125 and AcctEmp.Split_rep in ('PB54') join PayoutSystemDW.dbo.PS_DimensionAcctSec AcctSec on fct.DimAcctSecID = AcctSec.DimAcctSecID join PayoutSystemDW.dbo.PS_DimensionRefBuySell bs on fct.DimRefBuySellID = bs.DimRefBuySellID join PayoutSystemDW.[dbo].[PS_DimensionAcctOrg] AcctOrg on fct.DimAcctOrgID = AcctOrg.DimAcctOrgID and AcctOrg.OrgCode in ('38C') join PayoutSystemDW.[dbo].[PS_DimensionAcctClt] as AcctClt on AcctClt.DimAcctCltID = AcctClt.DimAcctCltID and AcctClt.ClientName = 'BRACY DENNIS M' join PayoutSystemDW.dbo.PS_DimensionTradeInd ti on ti.DimTradeIndID = fct.DimTradeIndID and ti.[Trade_Ind_Year] = 2014 join PayoutSystemDW.dbo.PS_DimensionRefMonth RefMonth on RefMonth.MonthID = ti.Trade_Ind_Month where RptBus.ReportID = 1 ) a group by a.Header, a.CUSIP, a.SecNo, a.Security, a.Symbol,a.Split_rep,a.Sales_Person_Name

select fct.Comm as Total, … From FactAccountSummary fct join DimensionRptBus RptBus on fct.DimRptBusID = RptBus.DimRptBusID Join DimensionCurrency cur on fct.DimCurID = cur.DimCurID join DimensionRefBuySell bs on fct.DimRefBuySellID = bs.DimRefBuySellID join DimensionAcctOrg] AcctOrg on fct.DimAcctOrgID = AcctOrg.DimAcctOrgID join DimensionAcctClt as AcctClt on AcctClt.DimAcctCltID = AcctClt.DimAcctCltID

QPI CPU & Memory 2001 versus 2014 Xeon E7 v2 (Ivy Bridge), 15 cores, 3 QPI 4 x 15 = 60 cores 3TB (96 x 32GB) 24 DIMMs per socket 40 PCI-E gen3 lanes + x4 g2 / socket 2001 – 4 sockets, 4 cores Pentium III Xeon, 900MHz 4-8GB memory? Xeon MP FSB P L2 PPP MCH Each core today is more than 10x over Pentium III (700MHz?) Mem___2013 __ GB __ $191 __ $180 32GB __ $794 __ $650 64GB _____ __ $4510 PCH DMI x4 MC GFX QPI MI PCI-E MI C1 C2 C3 C0 C4 C8 C7 C6 C9 C5 LLC QPI MI PCI-E MI C1 C2 C3 C0 C4 C8 C7 C6 C9 C5 LLC QPI MI PCI-E MI C1 C2 C3 C0 C4 C8 C7 C6 C9 C5 LLC QPI MI PCI-E MI C1 C2 C3 C0 C4 C8 C7 C6 C9 C5 LLC DMI 2 PCI-E DMI 2 PCI-E

CPU & Memory 2001 versus 2012 Xeon E5 (Sandy Bridge), 8 cores, 2 QPI 4 x 8 = 32 cores total Westmere-EX 1TB (64x16GB) (3 QPI) Sandy Bridge E5: 768GB (48 x 16GB) (2 QPI) FSB P L2 PP P MCH QPI DMI 2 PCI-E 2001 – 4 sockets, 4 cores Pentium III Xeon, 900MHz 4-8GB memory? Xeon MP Each core today is more than 10x over Pentium III (700MHz?) Mem___2013 __ GB __ $191 __ $180 32GB __ $794 __ $650 64GB _____ __ $4510 QPI PCI-E QPI MI PCI-E C1C6 C2C5 C3C4 LLC QPI MI C7C0 DMI 2 PCI-E MI PCI-E C1C6 C2C5 C3C4 LLC QPI MI C7C0 PCI-E MI PCI-E C1C6 C2C5 C3C4 LLC QPI MI C7C0 MI PCI-E C1C6 C2C5 C3C4 LLC QPI MI C7C0

Intel E5 & E7 v2 (Ivy-Bridge) PCH DMI x4 MC GFX E3 v3

Processor – Core

Microprocessor Pipeline Branch Predict Instruction Fetch Decode Register Allocate & Rename Re-Ordering Buffer Schedule Execute Flags Retire BP IFIDRATROBSchExecFlags 1st RetireBPIFIDRATROBSchExecFlags 2nd Retire 3GHz 0.33ns clock 5 ns from start to finish 200MHz Microprocessor (core) is (multi-lane) assembly line Each core is superscalar Processor (socket) has multiple cores System has multiple sockets

Micro-architecture Sandy-Bridge

Haswell (Xeon E5/7 v3)

CPU Access Times L1 IL1 D L2 Unified L3 Slice DRAM Core – 3.33GHz 1 CPU cycle = 0.3ns L1 cache – 4 CPU clocks (1ns) L2 cache 12 CPU cycles (4ns?) L3 cache 29+ cycles Local node memory 28 cycles + 49 ns (open page) 28 cycles + 56 ns (random page) Remote node (1-hop) memory ns 2-hop ns+? Logical 0 Logical 1

Latency Orders of Magnitude PCH DMI x4 MC GFX Core Core – 3.33GHz 1 CPU cycle = 0.3ns L1 cache – 4 CPU clocks (1ns) L2 cache 12 CPU cycles (4ns?) L3 cache 29+ cycles Local node memory 28 cycles + 49 ns (open page) 28 cycles + 56 ns (random page) Remote node (1-hop) memory ns 2-hop ns+? L1 Cache LLC

Westmere-EX 8-Socket System QPI IOH 0 QPI IOH 1 QPI IOH 2 QPI IOH 3 PCI-E x8 PCI-E x4 ESI PCI-E x8 PCH C2 C1 C0 C7 C8 C9 C4C5 QPI C3C6 MC LLC MC QPI C2 C1 C0 C7 C8 C9 C4C5 QPI C3C6 MC LLC MC QPI C2 C1 C0 C7 C8 C9 C4C5 QPI C3C6 MC LLC MC QPI C2 C1 C0 C7 C8 C9 C4C5 QPI C3C6 MC LLC MC QPI C2 C1 C0 C7 C8 C9 C4C5 QPI C3C6 MC LLC MC QPI C2 C1 C0 C7 C8 C9 C4C5 QPI C3C6 MC LLC MC QPI C2 C1 C0 C7 C8 C9 C4C5 QPI C3C6 MC LLC MC QPI C2 C1 C0 C7 C8 C9 C4C5 QPI C3C6 MC LLC MC QPI Large server systems are very complicated Software developed without consideration for system architecture will likely have severe problems This applies to the OS, SQL Server and the application SMB

Storage 2001 versus 2012/13 QPI 192 GB PCIe x8 PCIe x4 IBRAID 10GbE HDD SSD x 10K HDD 125 IOPS each = 12.5K IOPS IO Bandwidth limited: 1.3GB/s (1/3 memory bandwidth) SSDs, >10K+ IOPS each, 1M IOPS total possible 10-20GB/s+ IO Bandwidth easy 6.4GB/s on each PCIe G3 x8 SAN vendors – questionable BW PCI MCH RAID HDD

SAN SSD10K7.2K Hot Spares Auto-tier pools Switch SP ASP B 8 Gb FC x4 SAS 2GB/s 24 GB HBA PCIe or 10Gb FCOE 0.8 GB/s x4 SAS 2GB/s Data 5Data 6Data 7Data 1Data 2Data 3Data 4Data 8Data 9Data 13Data 10Data 14Data 11Data 15Data 12Data 16SSD 1SSD 2SSD 3SSD 4Log 1Log 2Log 3Log 4 Node GB Node GB Switch SP ASP B 8 Gb FC 24 GB SSD x8 SSD x8 SSD x8 SSD Node 1Node GB

Performance Past, Present, Future When will servers be so powerful that … – Been saying this for a long time Today – 10 to 100X overkill – 32-cores in 2012, 60-cores in 2014 – Enough memory that IO is only sporadic – Unlimited IOPS with SSD What can go wrong? Today’s topic

SQL Performance Natural keys with unique indexes, not SQL The Execution Plan links all the elements of performance Index tuning alone has limited value Over indexing can cause problems as well Index and Statistics maintenance policy 1 Logic may need more than one execution plan? Compile cost versus execution cost? Tables and SQL combined implement business logic Plan cache bloat? SQL Tables natural keys Indexes Execution Plan Statistics & Compile parameters Compile Row estimate propagation errors Storage Engine Hardware DOP Memory Parallel plans Recompile temp table / table variable Query Optimizer Index & Stats Maintenance API Server Cursors: open, prepare, execute, close? SET NO COUNT Information messages

Factors to Consider SQLTablesIndexes Query Optimizer Statistics Compile Parameters Storage Engine Hardware DOP memory

Special Topics Data type mismatch Multiple Optional Search Arguments (SARG) – Function on SARG Parameter Sniffing versus Variables Statistics related (big topic) OR, AND/OR combinations IN/NOT IN, EXISTS Complex Query with sub-expressions Parallel Execution Not in order of priority

1a. Data type mismatch nvarchar(25) = N'Customer# ' SELECT * FROM CUSTOMER WHERE C_NAME SELECT * FROM CUSTOMER WHERE C_NAME = auto-parameter discovery? Unable to use index seek Table column is varchar Parameter/variable is nvarchar

1b. Type Mismatch – Row Estimate SELECT * FROM CUSTOMER WHERE C_NAME LIKE 'Customer# %' SELECT * FROM CUSTOMER WHERE C_NAME LIKE N’Customer# %' Row estimate error could have severe consequences in a complex query

SELECT TOP + Row Estimate Error SELECT TOP 1000 [Document].[ArtifactID] FROM [Document] (NOLOCK) WHERE [Document].[AccessControlListID_D] IN (1, , ) AND EXISTS ( SELECT [DocumentBatch].[BatchArtifactID] FROM [DocumentBatch] (NOLOCK) INNER JOIN [Batch] (NOLOCK) ON [Batch].ArtifactID = [DocumentBatch].[BatchArtifactID] WHERE [DocumentBatch].[DocumentArtifactID] = [Document].[ArtifactID] AND [Batch].[Name] LIKE N'%Value%' ) ORDER BY [Document].[ArtifactID] Data type mismatch – results in estimate rows high Top clause – easy to find first 1000 rows In fact, there are few rows that match SARG Wrong plan for evaluating large number of rows

MULTIPLE OPTIONAL SARG

2. Multiple Optional SARG int = 1 SELECT * FROM LINEITEM WHERE IS NULL OR L_ORDERKEY AND IS NULL OR L_PARTKEY AND IS NOT NULL IS NOT NULL)

IF block int = 1 IF IS NOT NULL) SELECT * FROM LINEITEM WHERE (L_ORDERKEY AND IS NULL OR L_PARTKEY ELSE IF IS NOT NULL) SELECT * FROM LINEITEM WHERE (L_PARTKEY Need to consider impact of Parameter Sniffing, Consider the OPTIMIZER FOR hint These are actually the stored procedure parameters

Dynamically Built Parameterized SQL int = nvarchar(100) = N‘/* Comment */ SELECT * FROM LINEITEM WHERE = int' IF IS NOT NULL) + N' AND L_ORDERKEY IF IS NOT NULL) + N' AND L_PARTKEY IF block is easier for few options Dynamically built parameterized SQL better for many options Consider /*comment*/ to help identify source of SQL

2b. Function on column SARG SELECT COUNT(*), SUM(L_EXTENDEDPRICE) FROM LINEITEM WHERE YEAR(L_SHIPDATE) = 1995 AND MONTH(L_SHIPDATE) = 1 SELECT COUNT(*), SUM(L_EXTENDEDPRICE) FROM LINEITEM WHERE L_SHIPDATE BETWEEN ' ' AND ' ' int = 1 SELECT COUNT(*), SUM(L_EXTENDEDPRICE) FROM LINEITEM WHERE L_SHIPDATE AND

Estimated versus Actual Plan - rows Estimated Plan – 1 row??? Actual Plan – actual rows 77,356

3 Parameter Sniffing -- first call, procedure compiles with these parameters exec = = ' ' -- subsequent calls, procedure executes with original plan exec = = ' ' Need different execution plans for narrow and wide range Options: 1) OPTIMIZE FOR – one plan for all ranges 2) WITH RECOMPILE – compile on each execute 3) main procedure calls 1 of 2 identical sub-procedures One sub-procedure is only called for narrow range Other called for wide range Skewed data distributions also important Example: Large & small customers Assuming date data type

STATISTICS

4 Statistics Auto-recompute points Sampling strategy – How much to sample - theory? – Random pages versus random rows – Histogram Equal and Range Rows – Out of bounds, value does not exist – etc. Statistics Used by the Query Optimizer in SQL Server 2008 Eric N. Hanson and Yavor Angelov, Contributor: Lubor Kollar Optimizing Your Query Plans with the SQL Server 2014 Cardinality Estimator Joseph Sack

Statistics Structure Stored (mostly) in binary field Scalar values Density Vector – limit 30, half in NC, half Cluster key Histogram Up to 200 steps Consider not blindly using IDENTITY on critical tables Example: Large customers get low ID values Small customers get high ID values

Statistics Auto/Re-Compute Automatically generated on query compile Recompute at 6 rows, 500, every 20%? Has this changed? 2008 R2 Trace 2371 – lower threshold auto recomputed for large tables

Statistics Sampling Sampling theory – True random sample – Sample error - square root N Relative error 1/ N SQL Server sampling – Random pages But always first and last page??? – All rows in selected pages

Row Estimate Problems (at source) Skewed data distribution Out of bounds Value does not exist Row estimate errors at source – is classified under statistics topic

Loop Join - Table Scan on Inner Source Estimated out from first 2 tabes (at right) is zero or 1 rows. Most efficient join to third table (without index on join column) is a loop join with scan. If row count is 2 or more, then a fullscan is performed for each row from outer source Default statistics rules may lead to serious ETL issues Consider custom strategy

Compile Parameter Not Exists Main procedure has cursor around view_Servers First server in view_Servers is ’CAESIUM’ Cursor executes sub-procedure for each Server sql: SELECT MAX(ID) FROM TReplWS WHERE Hostname But CAESIUM does not exist in TReplWS!

Good and Bad Plan?

SqlPlan Compile Parameters

<StmtSimple varchar(50) = ISNULL(MAX(id),0) FROM TReplWS WHERE Hostname StatementId="1" StatementCompId="43" StatementType="SELECT" StatementSubTreeCost=" " StatementEstRows="1" StatementOptmLevel="FULL" QueryHash="0x671D2B3E17E538F1" QueryPlanHash="0xEB64FB22C47E1CF2" StatementOptmEarlyAbortReason="GoodEnoughPlanFound"> <StatementSetOptions QUOTED_IDENTIFIER="true" ARITHABORT="false" CONCAT_NULL_YIELDS_NULL="true" ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" NUMERIC_ROUNDABORT="false" /> <RelOp NodeId="0" PhysicalOp="Compute Scalar" LogicalOp="Compute Scalar" EstimateRows="1" EstimateIO="0" EstimateCPU="1e-007“ AvgRowSize="15" EstimatedTotalSubtreeCost=" " Parallel="0" EstimateRebinds="0" EstimateRewinds="0"> Compile parameter values at bottom of sqlplan file

AND – OR, IN / NOT IN, EXISTS / NOT EXISTS COMBINATIONS

5a Single Table OR -- Single table SELECT * FROM LINEITEM WHERE L_ORDERKEY = 1 OR L_PARTKEY =

5a Join 2 Tables, OR in SARG -- subsequent calls, procedure executes with original plan SELECT O_ORDERDATE, O_ORDERKEY, L_SHIPDATE, L_QUANTITY FROM LINEITEM INNER JOIN ORDERS ON O_ORDERKEY = L_ORDERKEY WHERE L_PARTKEY = OR O_CUSTKEY =

5a UNION (ALL) instead of OR SELECT O_ORDERDATE, O_ORDERKEY, L_SHIPDATE, L_QUANTITY, O_CUSTKEY, L_PARTKEY FROM LINEITEM INNER JOIN ORDERS ON O_ORDERKEY = L_ORDERKEY WHERE L_PARTKEY = UNION (ALL) SELECT O_ORDERDATE, O_ORDERKEY, L_SHIPDATE, L_QUANTITY, O_CUSTKEY, L_PARTKEY FROM LINEITEM INNER JOIN ORDERS ON O_ORDERKEY = L_ORDERKEY WHERE O_CUSTKEY = AND (L_PARTKEY <> OR L_PARTKEY IS NULL) -- Caution: select list should have keys to ensure correct rows UNION removes duplicates (with Sort operation) UNION ALL does not -- Hugo Kornelis trick --

5b AND/OR Combinations Hash Join is good method to process many rows – Requirement is equality join condition – AND/OR, IN NOT IN, EXISTS NOT EXISTS combinations – Query optimizer may not be to determine that equality join condition exists – Execution plan will use loop join, – and attempt to force hash join will be rejected Re-write using UNION in place of OR And LEFT JOIN in place of NOT IN SELECT xx FROM A WHERE col1 IN (expr1) AND col2 NOT IN (expr2) SELECT xx FROM A WHERE (expr1) AND (expr2 OR expr3) More on AND/OR combinations:

COMPLEX QUERIES

Complex Queries High Compile effort – Many joins, Many indexes – Estimated plan cost correlation Row estimation errors after multiple operations Row estimate errors at source – is classified under statistics topic

Complex Query with Sub-expression Query complexity – really high compile cost Repeating sub-expressions (including CTE) – Must be evaluated multiple times Main Problem - Row estimate error propagation Solution/Strategy – Get a good execution plan – Temp table when estimate is high, actual is low. More on AND/OR combinations: When Estimate is low, and actual rows is high, need to balance temp table insert overhead versus plan benefit. Would a join hint work?

More Plan Details Query with joining 6 tables Each table has too many indexes Row estimate is high – plan cost is high Query optimizer tries really really hard to find better plan Actual rows is moderate, any plan works

Temp Table and Table Variable Forget what other people have said – Most is Temp Tables – subject to statistics auto/re-compile Table variable – no statistics, assumes 1 row Question: In each specific case: does the statistics and recompile help or not? – Yes: temp table – No: table variable Is this still true?

Row Estimate Error after Join IO – synchronous when estimate rows is 25

Row Estimate 2

Parallelism Designed for 1998 era – Cost Threshold for Parallelism: default 5 – Max Degree of Parallelism – instance level – OPTION (MAXDOP n) – query level Today – complex system – 32 cores – Plan cost 5 query might run in 10ms? – Some queries at DOP 4 – Others at DOP 16? More on Parallelism: Really need to rethink parallelism / NUMA strategies Number of concurrently running queries x DOP less than number of logical/physical processors? Tables with computed columns may inhibit parallelism?

Parallel Execution – or not? Tables with computed columns using UDF prevent parallelism

Full-Text Search Loop Join with FT as inner Source Full Text search Potentially executed many times

varchar(max) stored in lob pages Disk IO to lob pages is synchronous? – Must access row to get 16 byte link? – Feature request: index pointer to lob SQL PASS 2013 Understanding Data Files at the Byte Level Mark Rasmussen

legacy API Server Cursors / Cursor Stored Procedures – sp_prepare / sp_prepexec, sp_execute, sp_unprepare – sp_cursoropen, sp_cursorfetch, sp_cursorclose – sp_cursorprepare / sp_cursorprepexec, sp_cursorexecute, sp_cursorunprepare Guess which is not called? – Symptom: sp_reset_connection API Server Cursors Cursor Stored Procedures

Summary Hardware today is really powerful – Storage may not be – SAN vendor disconnect Standard performance practice – Top resource consumers, index usage But also Look for serious blunders Kevin Boles – Common TSQL Mistakes

Thank you to our sponsors

Special Topics Data type mismatch Multiple Optional Search Arguments (SARG) – Function on SARG Parameter Sniffing versus Variables Statistics related (big topic) AND/OR Complex Query Parallel Execution

SQL Server Edition Strategies Enterprise Edition – per core licensing costs – Old system strategy 4 (or 2)-socket server, top processor, max memory – Today: How many cores are necessary 2 socket system, max memory (16GB DIMMs) Is standard edition adequate – Low cost, but many important features disabled BI edition – 16 cores – Limited to 64GB for SQL Server process

New Features in SQL Server 2005 – Index included columns – Filtered index – CLR 2008 – Partitioning – Compression 2012 – Column store (non-clustered) 2014 – Column store clustered – Hekaton

GENERAL PERFORMANCE General Performance

SQL Performance General Client-side architecture – Connection pooling – stored procedures versus SQL, parameterized Database Architecture – Cluster key, primary key, natural keys, foreign keys SQL – Indexing Indexes & Statistics Maintenance

Client-side Architecture Connection pooling: – Connection.Open, Execute, Connection.Close – Sp_reset_connection Stored procedures – parameterized SQL – Stored procedure name is short – Parameterized SQL may not be Larger than 1 Ethernet packet? 2?, 8?

Database Architecture Normalization Cluster key Primary Key & other unique / natural keys Foreign keys

Principles TestingData Server Network Storage

QPI CPU & Memory 2001 versus 2014x Xeon E7 v2 (Ivy Bridge, 3 QPI) 4 x 15 = 60 cores 3TB (96 x 32GB) 24 DIMMs per socket 40 PCI-E gen3 lanes + x4 g2 / socket 2001 – 4 sockets, 4 cores Pentium III Xeon, 900MHz 4-8GB memory? Xeon MP FSB P L2 PPP MCH Each core today is more than 10x over Pentium III (700MHz?) Mem___2013 __ GB __ $191 __ $180 32GB __ $794 __ $650 PCH DMI x4 MC GFX QPI MI PCI-E MI C1 C2 C3 C0 C4 C8 C7 C6 C9 C5 LLC QPI MI PCI-E MI C1 C2 C3 C0 C4 C8 C7 C6 C9 C5 LLC QPI MI PCI-E MI C1 C2 C3 C0 C4 C8 C7 C6 C9 C5 LLC QPI MI PCI-E MI C1 C2 C3 C0 C4 C8 C7 C6 C9 C5 LLC DMI 2 PCI-E DMI 2 PCI-E

Work in progress QPI MI PCI-E MI C1 C2 C3 C0 C4 C8 C7 C6 C9 C5 LLC B C D C E MI PCI-E C1C6 C2C5 C3C4 LLC QPI MI C7C0 QPI MI PCI-E MI C1 C2 C3 C4 C8 C7 C6 C5 LLC DMI 2 PCI-E QPI MI PCI-E MI C1 C2 C3 C0 C4 C8 C7 C6 C9 C5 LLC PCI-E MI PCI-E C1C6 C2C5 C3C4 LLC QPI MI C7C0