How SQL Server Indexes Work Sharon F. Dooley

How SQL Server Indexes Work Sharon F. Dooley sharond@voicenet.com

SQL Server Indexes SQL Server indexes are based on B-trees –Special records called nodes that allow keyed access to data –Two kinds of nodes are special Root Leaf Intermediate node Leaf node Data pages Root node AO OT TW EI G CAT ACEGIKMN OQ AI

SQL Server B-Tree Rules Root and intermediate nodes point only to other nodes Only leaf nodes point to data The number of nodes between the root and any leaf is the same for all leaves A node always contains between K and K/2 branches, where K is the branching factor –Branching factor is the number of keys in the node B-trees are always sorted The tree will be maintained during insertion, deletion, and updating so that these rules are met –When records are inserted or updated, nodes may split –When records are deleted, nodes may be c ollapsed

What Is a Node? A page that contains key and pointer pairs Key Pointer

Splitting a B-Tree Node Root (Level 0) Node (Level 1) Leaf (Level 2) AbbyBobCarolDave AbbyAdaAndyAnn AdaAlanAmandaAmy BobAlanAmandaCarolAmyDaveAda DB

Lets Add Alice Step 1: Split the leaf node BobAlanAmandaCarolAmyDaveAdaAlice AdaAlanAlice AmandaAmy

Adding Alice Step 2: Split the next level up DB Leaf AbbyAdaAmanda AndyAnn BobAlanAmandaCarolAmyDaveAdaAlice AdaAlanAliceAmandaAmy

Adding Alice (continued) Split the root DB Leaf AdaAlanAlice BobAlanAmandaCarolAmyDaveAdaAlice AmandaAmy AndyAnn CarolDave AbbyAndyBob AbbyAdaAmanda

Adding Alice (continued) When the root splits, the tree grows another level Root (Level 0) Node (Level 1) Node (Level 2) Leaf (Level 3) DB AbbyCarol AmandaAmy BobAlanAmandaCarolAmyDaveAdaAlice AdaAlanAlice AbbyAndyBob AbbyAdaAmanda CarolDave AndyAnn

Page splits cause fragmentation Two types of fragmentation –Data pages in a clustered table –Index pages in all indexes Fragmentation happens because these pages must be kept in order Data page fragmentation happens when a new record must be added to a page that is full –Consider an Employee table with a clustered index on LastName, FirstName – A new employee, Peter Dent, is hired Extent Adams, Carol Ally, Kent Baccus, Mary David, Sue Dulles, Kelly Edom, Mike Farly, Lee Frank, Joe Ollen, Carol Oppus, Larry...

Data Page Fragmentation Extent Dulles, Kelly Edom, Mike... Adams, Carol Ally, Kent Baccus, Mary David, Sue Dent, Peter Farly, Lee Frank, Joe Ollen, Carol Oppus, Larry...

Index Fragmentation Index page fragmentation occurs when a new key-pointer pair must be added to an index page that is full –Consider an Employee table with a nonclustered index on Social Security Number Employee 048-12-9875 is added 036-11-9987, pointer 036-33-9874, pointer 038-87-8373, pointer 046-11-9987, pointer 048-33-9874, pointer 052-87-8373, pointer 116-11-9987, pointer 116-33-9874, pointer... 124-11-9987, pointer 124-33-9874, pointer 125-87-8373, pointer Extent

Index Fragmentation (continued) Extent 036-11-9987, pointer 036-33-9874, pointer 038-87-8373, pointer 046-11-9987, pointer 048-12-9875, pointer 116-11-9987, pointer 116-33-9874, pointer... 124-11-9987, pointer 124-33-9874, pointer 125-87-8373, pointer 048-33-9874, pointer 052-87-8373, pointer...

Studying Fragmentation in SQL Server 2000 To determine if there is fragmentation –In a clustered table or a nonclustered index DBCC SHOWCONTIG [([table_id | table_name | view_id | view_name [, index_id | index_name])] DBCC SHOWCONTIG scanning 'Employees' table... Table: 'Employees' (1977058079); index ID: 1, database ID: 7 TABLE level scan performed. - Pages Scanned................................: 707 - Extents Scanned..............................: 90 - Extent Switches..............................: 572 - Avg. Pages per Extent........................: 7.9 - Scan Density [Best Count:Actual Count].......: 15.53% [89:573] - Logical Scan Fragmentation..................: 39.18% - Extent Scan Fragmentation...................: 58.89% - Avg. Bytes Free per Page.....................: 4338.9 - Avg. Page Density (full).....................: 46.39% DBCC execution completed. If DBCC printed error messages, contact your system administrator.

Studying Fragmentation in SQL Server 2000 (continued) Unless the table or index spans multiple files –Extent Switches and Extents Scanned should be approximately equal Scan Density should be close to 100 percent Avg. Page Density should be high and Avg. Bytes Free Per Page should be low Logical Scan Fragmentation and Extent Scan Fragmentation should be as close to 0 as possible Clearly the Employees table is terribly fragmented!

Studying Fragmentation in SQL Server 2005 and 2008 SELECT object_name(s.object_id), name, avg_fragmentation_in_percent FROM sys.dm_db_index_physical_stats (db_id('databasename'), object_id('tablename'), {indexid | NULL}, {partitionnumber | NULL}, {'LIMITED' | 'SAMPLED' | 'DETAILED' | NULL}) as s INNER JOIN sys.indexes as i ON s.object_id = i.object_id AND s.index_id = i.index_id If NULL is supplied for the last argument, LIMITED is assumed The avg_fragmentation_in_percent should be as close to 0 as possible

Studying Fragmentation in SQL Server 2005 and 2008 (continued) SELECT object_name(s.object_id), name, avg_fragmentation_in_percent FROM sys.dm_db_index_physical_stats (db_id('bigwind'), object_id('Employees'), null, null, null) as s INNER JOIN sys.indexes as i ON s.object_id = i.object_id AND s.index_id = i.index_id WHERE alloc_unit_type_desc = 'IN_ROW_DATA' Results EmployeesLastName_IDx0.685 EmployeesPK_Employees3.0303 EmployeesCity_IDX0 EmployeesRegion_IDX3.922

Repairing Fragmentation Repair index fragmentation by rebuilding index Rebuilding clustered index repairs table fragmentation DBCC DBREINDEX DBCC DBREINDEX (tablename [, indexname [, fillfactor]]) –Can reorganize indexes that implement primary key and unique constraints CREATE INDEX –DROP_EXISTING causes SQL Server to create and drop the index in a single step Faster than dropping with the DROP INDEX command and then re-creating ALTER TABLE … ADD CONSTRAINT … PRIMARY KEY or UNIQUE

Repairing Fragmentation (continued) DBCC INDEXDEFRAG does not – Lock the index – Do as thorough job as the other methods – Allow specification of a fill factor Uses the fill factor from the last CREATE INDEX for this index DBCC INDEXDEFRAG ( { database_name | database_id | 0 }, { table_name | table_id}, { index_name | index_id } ) ALTER INDEX index_name ON table_name REORGANIZE – Same as DBCC INDEXDEFRAG ALTER INDEX index_name ON table_name REBUILD – Allows concurrent access if you add WITH (ONLINE = ON) to the ALTER INDEX command – Uses the version store in tempdb – Same as DBCC DBREINDEX 20052008

SQL Server Indexes SQL Server indexes come in two flavors –Clustered indexes Database rows are in order on the index key The data pages are the leaf nodes of the index –Nonclustered indexes Leaf level is in index order but the data is not Leaf nodes contain pointers to rows One clustered index per table –Choose wisely –Should always have a clustered index Allows reorganization of the data pages 249 nonclustered indexes per table

Clustered Index Database and leaf node Root AbbyBobCarolDave AbbyAdaAndyAnn AdaAlanAmandaAmy

Nonclustered Index Database Root AbbyBobCarolDave AmyAdaAmandaAlan Leaf node AbbyAdaAndyAnn AdaAlanAmandaAmy

Clustered and Nonclustered Indexes Interact Clustered indexes are always unique –If you dont specify unique when creating them, SQL Server may add a uniqueifier to the index key Only used when there actually is a duplicate Adds 4 bytes to the key The clustering key is used in nonclustered indexes –This allows SQL Server to go directly to the record from the nonclustered index –If there is no clustered index, a record identifier will be used instead 1JonesJohn 2SmithMary 3AdamsMark 4DouglasSusan Adams3 Douglas4 Jones1 Smith2 Leaf node of a clustered index on EmployeeID Leaf node of a nonclustered index on LastName

Clustered and Nonclustered Indexes Interact (continued) Another reason to keep the clustering key small! Consider the following query: SELECT LastName, FirstName FROM Employee WHERE LastName = 'Douglas' When SQL Server uses the nonclustered index, it –Traverses the nonclustered index until it finds the desired key –Picks up the associated clustering key –Traverses the clustered index to find the data

Heaps and Chains When you place a clustered index on a table, the pages are chained together in a doubly linked list SQL Server can follow the pointers to move from page to page When there is no clustered index, the table is called a heap –Data is located Through nonclustered indexes By scanning all the pages in the table

Indexes and Inserts When there is no clustered index on a table, SQL Server uses the Page Free Space page to find a page with space for the new record Inserts into tables with a clustered index can cause page splits or hotspots When a particular part of the database is particularly popular, it is called a hotspot Hotspots create contention problems Clustered indexes on keys that arrive in random order cause page splits Clustered indexes on keys that arrive in index order create hotspots –All inserts are again at the end of the table Identities Dates –However, there will be no page splits or collapses

Indexes and Updates When data is modified, indexes may have to be modified as well Changing data in a table with no indexes –Data will be changed in place unless the update means that the row will no longer fit on the page –If the row wont fit, it will simply be moved to a new page Changing a clustering key column –The row will be deleted from its original location –It will be inserted into the new location –All nonclustered indexes must be maintained –Exception: A change that doesnt affect the index order Changing Thompson to Thompsen will be done in place Avoid clustered indexes on columns that frequently change ThompsonTyneThomas

Indexes and Updates (continued) Changing any non-key column in a table with only nonclustered indexes –Data will be changed in place unless the row will no longer fit –If the row will no longer fit It will be moved to another page A forwarding pointer will be left on the original page that points to the new location Index pointers dont need to be updated An additional I/O will now be required There will never be more than one forwarding pointer, no matter how many times a row moves To see whether updates have produced forwarding pointers, use DBCC SHOWCONTIG('tablename') WITH TABLERESULTS SELECT forwarded_record_count FROM sys.dm_db_index_physical_stats (db_id('databasename'), object_id('tablename'), NULL, NULL, NULL) 2000 20052008

Indexes and Deletes Deleting from a heap –Row is physically deleted –Remaining rows on page below the deleted record are not moved up at the time the delete happens –Page will be compressed when space is needed for another row on the page Deleting from a leaf node –Data pages of clustered index –Leaf nodes of nonclustered index –Records may not be physically deleted at the time the delete is issued –Records may be marked as ghost records Used by lock manager Not retrievable by users –Special SQL Server process cleans up the ghost records Wont clean up records that are part of an active transaction Doesnt compress page

Indexes and Deletes (continued) Deleting from a nonleaf node –No ghost records –Page is not compressed When rows are deleted, both nonclustered and clustered indexes must be maintained When the last row is deleted from a page (index or data), the page is deallocated and returned to the free space pool –Unless it is the only page in the table –A table always has at least one page, even if it is empty

What Are Index Statistics? Metric used by the optimizer in determining whether or not an index is useful for a particular query Stored in – An image column named statblob in the sysindexes table – An internal and invisible table Essentially a histogram Statistics are kept for 200 steps 2000 2005 2008

Selectivity The statistics allow the optimizer to determine the selectivity of an index –A unique, single-column index always has a selectivity of 1 One index entry points to exactly one row Another term for this is density –Density is the inverse of selectivity Density values range from 0 to 1 A selective index has a density of 0.10 or less A unique, single-column index always has a density of 0.0 When the index is composite, it becomes a little more complicated –SQL Server maintains detailed statistics only on the leftmost column –It does compute density for each column Assume there is an index on (col1, col2, col3) Density is computed for –Col1 –Col1 + Col2 –Col1 + Col2 + Col3

Exploring Statistics To see the index statistics, use – DBCC SHOW_STATISTICS ('tablename', {'indexname' | 'statisticsname'}) – DBCC SHOW_STATISTICS ('Employees', 'EmployeeName_Idx') Interpreting the step output RANGE_HI_KEY Upper-bound value of a histogram step RANGE_ROWS Number of rows from the sample that fall within a histogram step, excluding the upper bound EQ_ROWS Number of rows from the sample that are equal in value to the upper bound of the histogram step DISTINCT_RANGE_ROWS Number of distinct values within a histogram step, excluding the upper bound AVG_RANGE_ROWS Average number of duplicate values within a histogram step, excluding the upper bound

Exploring Index Statistics (continued) Statistics for INDEX 'EmployeeName_IDX'. Updated Rows Rows Sampled Steps Density Avg key length -------------------- ------- ------------- ------ ------------ ------------ Jan 27 2002 7:00PM 10009 10009 200 1.3958309E-4 28.475372 All density Average Length Columns ------------ -------------- -------- 1.4271443E-4 13.252672 LastName 9.9940036E-5 24.475372 LastName, FirstName 9.9910081E-5 28.475372 LastName, FirstName, EmployeeID RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS ------------ ---------- ------- -------------------- -------------- Aaby 0.0 2.0 0 0.0 Abrahamson 59.0 2.0 43 1.3409091... Zuran 13.0 1.0 10 1.3 Zvonek 0.0 1.0 0 0.0 Distribution steps The report for SQL Server 2005 and later is slightly different at the beginning

Statistics Maintenance By default, SQL Server will automatically maintain the statistics Index statistics are computed (or recomputed) when the index is created or rebuilt SQL Server keeps track of the updates to a table – Each INSERT, UPDATE, or DELETE statement updates a counter in sysindexes named rowmodctr Note that TRUNCATE TABLE does not modify this counter – Whenever the statistics are recomputed, the counter is set back to zero When you issue a query, the optimizer checks rowmodctr to see whether the statistics are up to date – If they are not, the statistics will be updated

Statistics Maintenance (continued) Note that this may not always happen at the best time in a production system –Can turn off automatic update –Can manually update – In SQL Server 2005 and later, can set the AUTO_UPDATE_STATISTICS_ASYNC database option Example: –Assume that a table has 1,000 rows –The threshold would be 500 + (.20 * 1000) –You would expect to see the statistics automatically updated after about 700 modifications Table typeEmpty condition Threshold when emptyThreshold when not empty Permanent< 500 rowsNumber of changes >= 500Number of changes >= 500 + (Number of rows * 20%) Temporary< 6 rowsNumber of changes >= 6Number of changes >= 500

Estimating Page Accesses No index – Number of data pages in the table Equality query using a unique index – Nonclustered index Number of index levels + 1 (if there is no clustered index) Number of nonclustered index levels + the number of levels in the clustered index – Clustered index Number of index levels

Estimating Page Accesses for a Clustered Index Number of levels + number of data pages – Number of data pages = number of qualifying rows / rows per page Database and leaf node Root AbbyBobCarolDave AbbyAdaAndyAnn AdaAlanAmandaAmy

Estimating Page Accesses for a Nonclustered Index Number of levels + number of qualifying leaf pages + number of rows or number of levels + number of qualifying leaf pages + (number of rows * number of clustered index levels) – Number of qualifying leaf pages = number of qualifying rows / rows per page – Assumes every row is on a different page Database Root AbbyBobCarolDave AmyAdaAmandaAlan Leaf node AbbyAdaAndyAnn AdaAlanAmandaAmy

Covering Indexes When a nonclustered index includes all the data requested in a query (both the items in the SELECT list and the WHERE clause), it is called a covering index With a covering index, there is no need to access the actual data pages – Only the leaf nodes of the nonclustered index are accessed Because the leaf node of a clustered index is the data itself, a clustered index covers all queries Leaf node of a nonclustered index on LastName, FirstName, Birthdate AdamsMark1/14/19563 DouglasSusan12/12/19474 JonesJohn4/15/19671 SmithMary7/14/19702 The last column is EmployeeID. Remember that the clustering key is always included in a nonclustered index.

Covering Indexes (continued) This index covers the following queries: SELECT EmployeeID, LastName, FirstName, Birthdate FROM Employees WHERE Birthdate >= '1/12/1941' SELECT LastName FROM Employees WHERE EmployeeID = 7 SELECT LastName, FirstName FROM Employees WHERE LastName BETWEEN 'A' AND 'C' Remember that the number of accesses for a nonclustered index is the number of levels + the number of qualifying leaf pages + either the number of rows or (the number of rows * the number of levels in the clustered index) – A covering index eliminates the number of rows term from the equation – The optimizer is highly likely to use a nonclustered index because of this

Non-Key Index Columns SQL Server 2005 and later allow you to include columns in a non-clustered index that are not part of the key – Allows the index to cover more queries – Included columns only appear in the leaf level of the index – Up to 1,023 additional columns – Can include data types that cannot be key columns Except text, ntext, and image data types Syntax CREATE [ UNIQUE ] NONCLUSTERED INDEX index_name ON ( column [ ASC | DESC ] [,...n ] ) [ INCLUDE ( column_name [,...n ] ) ] Example CREATE NONCLUSTERED INDEX NameRegion_IDX ON Employees(LastName) INCLUDE (Region)

How SQL Server Indexes Work Sharon F. Dooley

Similar presentations

Presentation on theme: "How SQL Server Indexes Work Sharon F. Dooley"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

How SQL Server Indexes Work Sharon F. Dooley

Similar presentations

Presentation on theme: "How SQL Server Indexes Work Sharon F. Dooley"— Presentation transcript:

Similar presentations

About project

Feedback