Joe Chang yahoo . com qdpma.com

Joe Chang Jchang6 @ yahoo . com qdpma.com
Indexing Joe Chang yahoo . com qdpma.com

About Joe SQL Server consultant since 1999
Query Optimizer execution plan cost formulas (2002) True cost structure of SQL plan operations (2003?) Database with distribution statistics only, no data (2004) Decoding statblob/stats_stream write your own stats Disk IO cost structure Tools for system monitoring, execution plan analysis Freelance consultant since 1999, specializing in SQL Server performance. Reversed engineered the SQL Server query optimizer cost formulas (2001). Database with no data, but having the data distribution statistics from a production system. Automated index and execution plan cross reference analysis on (ExecStats). Indexing is one of foundations of databases, and is taught at the beginner level books? Unfortunately most of this is not entirely correct or on solid basis, and so it is important to learn what is true. What is usually taught is that selectivity is most important. In fact grouping is as Important. See Download: Blog:

Indexing Fundamental topic covered in most Intro SQL
Index key must be highly selective Or it won’t be used But its not entirely correct

TPC-C database schema Examples are based on TPC-C tables
Warehouse w_id 1:10 District d_w_id d_id 1:3000 Customer c_w_id c_d_id c_id Orders o_w_id o_d_id o_c_id o_id history h_c_w_id h_date h_c_d_id h_c_id h_amount 1:10 Order_line ol_w_id ol_d_id ol_c_id ol_o_id ol_id Examples are based on TPC-C tables

Nonclustered index details
CREATE CLUSTERED INDEX (Col1, Col2) CREATE INDEX IX ON Table (Col3, Col4) INCLUDE(C5) Explicit keys: Col3, Col4, Implicit keys: Col1, Col2 Full key: Col3, Col4, Col1, Col2 If one or more clustered index key columns are part of the explicit nonclustered index key, then other cluster key are implicit

Index Seek Examples Clustered index seek
nonclustered index seek, no key lookup nonclustered index seek, + key lookup for columns not in nonclustered index Table scan – when no suitable index (or forced with hint)

Index Selectivity – Why?
Plan cost 9.767, 3000 rows Plan cost 91.17, rows Plan cost 102.7, (IO: 101 ) pages, 1090MB Plan cost (IO portion) of is approximately 1/320 per key lookup row (random) 1/1350 per page in table scan. (See Execution Plan Cost Formulas slide deck) Ratio of Key lookup row to table scan page is 4.2:1, with CPU portion 3.5:1

Plan Cost Key Lookup Table scan
IO portion approximately 1/320 per row (random) Table scan 1/1350 per page Ratio of Key lookup row to table scan page is 4:21 IO + CPU portion 3.5:1

Loop Joins – similar to key lookup
Customer2 clustered on c_id only Plan cost 91.10, rows customers clustered on identity(-ish) Customer3 clustered on warehouse, same as orders2 Plan cost 13.53, rows

Index Important Points
Selectivity is important But so is locality (grouping rows into common pages) Applicable when Multiple tables have a common grouping column(s) Impacts choice of primary key and/or cluster key Key Lookup (IO portion) costs are roughly 1/320 per row (with adjustments) in a large table unless the query optimizer knows the rows are in a limited number of pages

Big Picture The Execution Plan links all the elements of performance
SQL Tables natural keys Indexes Execution Plan Statistics & Compile parameters Compile Row estimate propagation errors Storage Engine Hardware DOP Memory Parallel plans Recompile temp table / table variable Query Optimizer Index & Stats Maintenance API Server Cursors: open, prepare, execute, close? SET NO COUNT Information messages Tables and SQL combined implement business logic Natural keys with unique indexes, not SQL Index and Statistics maintenance policy 1 Logic may need more than one execution plan? Compile cost versus execution cost? Plan cache bloat? The Execution Plan links all the elements of performance Index tuning alone has limited value Over indexing can cause problems as well Client App also important

Indexing Objectives No such thing as perfect
Indexing is trade-offs, what is more important Insert/Update/Delete performance Select performance (& compile overhead) Maintenance? Also need to consider statistics update, compile parameters

Topics Primary Key, Cluster Key Nonclustered indexes
Included columns Filtered index Columnstore - Not covered here, see slide decks by Jimmy May Special – also not covered here XML, Spatial, Hash – memory optimized tables Related: Partitioning Partition to distribute or concentrate

Identity, Primary Key, Cluster Key
These are three different things Primary – uniquely identifies row/record Identity/Row GUID – mechanism for generating key Identity is useful, but should not be always used Guid – only use when absolutely no alternatives Consider a natural key for dimension tables Cluster Key – physical organization of table nonclustered indexes implicitly incorporates cluster key columns

Clustered Index Identity or other sequentially increasing value
Always inserted to the last page In theory, no fragmentation in the clustered index (or a nonclustered index have such as key) B-tree will become unbalanced Grouping Good for multi-row SELECT queries Gets fragmented with inserts

Common Grouping Option
Table A a_id Table C a_id b_id c_id Table B Table D d_id (unique) Cluster key a_id, b_id Cluster key a_id, b_id, c_id Unique nonclustered index on c_id Cluster key c_id, d_id If the cluster/primary key is on the parent table key + a local key, Does the local key need to be an identity? Example: Orders – LineItem LineItem table key is OrderId + LineItem sequence

Nonclustered Index Key columns Optional WITH options Include columns
Filter condition WITH options Row/page compression Fill factor Wish list, would be nice if we could: Specify different fill factors for leaf and upper levels Rebuild only upper levels, or only leaf level

Index Write Overhead Insert write overhead Update Write overhead
always Update Write overhead overhead only when modified column is part of index Index row moves if key column updated Delete Always Take away: Pay attention to IUD frequency If updates are frequent, which columns?

Nonclustered Index Key Strategy
SELECT xxx FROM WHERE selective search arguments AND (not so) or non-selective SARGS (GROUP BY) xxx (ORDER BY) xxx Index key should have important selective SARGs & possibly either the GROUP BY or ORDER BY Less important SARGs can be in INCLUDE list

Include List All (selected) columns negates need for key lookup
a major cost in execution plan for multi-row queries Considerations Fat include list -> almost another copy of the table? Update implications? Leave frequently updated columns out of include list? More work when updated column is in key, less when in include Options If a smaller include can minimize need for key lookups This is good enough

Indexing Scenario Query has a moderately selective equality SARG
& several additional WHERE clause conditions not amenable to index seek, but cumulatively reduce rows Sometimes, row reduction occurs after a join Many columns are needed (impractical to include all) Option Index Key on important equality SARG Other arguments in the INCLUDE list Rely on Key Lookup for remaining columns

B-tree Index depth: or INDEXPROPERTY sys.dm_db_index_physical_stats
root IL 2 IL 2 IL 2 IL 3 IL 3 IL 3 Index depth: INDEXPROPERTY or sys.dm_db_index_physical_stats

Temporary Indexing Permanent indexes for common operations
For maintenance or upgrade operations Drop/disable indexes -> op -> recreate Or create index -> op -> drop

Partitioning Can be used to concentrate active rows
Example: date – year, month, day etc. Can be used to distribute active rows over all partitions Example guid, hash, etc. Partitioning trick Partition key not the clustered index lead key Example: Cluster key, OrderId, DateKey (partition on date) Query with OrderId only : index seek on all partitions On date only: scan single partition

Summary Both Selectivity and Grouping/Locality important
Effects Key Lookup -> alternative is table scan Indexing trade-offs, no one rule for all cases Consider insert/update/read & maintenance Missing Indexes DMV not intelligent advice! Extreme high perf. requires verification

Related Statistics recomputed at first 6 rows modified, first 500 rows, then every 20% Newer versions of SQL Server auto-recompute at lower threshold (than 20%) for very large tables Default statistics sample problematic with grouping What are the compile parameter values on the first execute after a statistics recompute?

Joe Chang yahoo . com qdpma.com

Similar presentations

Presentation on theme: "Joe Chang yahoo . com qdpma.com"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Joe Chang yahoo . com qdpma.com

Similar presentations

Presentation on theme: "Joe Chang yahoo . com qdpma.com"— Presentation transcript:

Similar presentations

About project

Feedback