How SQL Server Indexes Work Sharon F. Dooley

Slides:



Advertisements
Similar presentations
Advanced SQL Topics Edward Wu.
Advertisements

Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Module 3: Creating and Tuning Indexes. Planning Indexes Creating Indexes Optimizing Indexes.
Denny Cherry Manager of Information Systems MVP, MCSA, MCDBA, MCTS, MCITP.
Creating Tables, Setting Constraints, and Datatypes What is a constraint and why do we use it? What is a datatype? What does CHAR mean? Page 97 in Course.
Creating Tables. 2 home back first prev next last What Will I Learn? List and provide an example of each of the number, character, and date data types.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
SQL Server Storage Engine.  Software architect at Red Gate Software  Responsible for SQL tools: ◦ SQL Compare, SQL Data Compare, SQL Packager ◦ SQL.
EXECUTION PLANS By Nimesh Shah, Amit Bhawnani. Outline  What is execution plan  How are execution plans created  How to get an execution plan  Graphical.
SQL Server 2005 Implementation and Maintenance Chapter 10: Maintaining and Automating SQL Server.
Working with SQL Server Database Objects
Module 6 Implementing Table Structures in SQL Server ®2008 R2.
A HEAP OF CLUSTERS A look into heaps vs. clustered tables Ami Levin CTO, DBSophic X.
Introduction to Structured Query Language (SQL)
Fundamentals, Design, and Implementation, 9/e Chapter 11 Managing Databases with SQL Server 2000.
Indexes Rose-Hulman Institute of Technology Curt Clifton.
Harvard University Oracle Database Administration Session 5 Data Storage.
Module 7: Creating and Maintaining Indexes. Overview Creating Indexes Creating Index Options Maintaining Indexes Introduction to Statistics Querying the.
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
1.1 CAS CS 460/660 Introduction to Database Systems File Organization Slides from UC Berkeley.
Introduction to Structured Query Language (SQL)
Oracle Database Administration Database files Logical database structures.
Performing Indexing and Full-Text Searching Lesson 21.
Extents, segments and blocks in detail. Database structure Database Table spaces Segment Extent Oracle block O/S block Data file logical physical.
Architecture Rajesh. Components of Database Engine.
Module 5 Planning for SQL Server® 2008 R2 Indexing.
Module 16: Performing Ongoing Database Maintenance
7 1 Chapter 7 Introduction to Structured Query Language (SQL) Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Nimesh Shah (nimesh.s) , Amit Bhawnani (amit.b)
6 1 Lecture 8: Introduction to Structured Query Language (SQL) J. S. Chou, P.E., Ph.D.
Database structure and space Management. Database Structure An ORACLE database has both a physical and logical structure. By separating physical and logical.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Indexes / Session 2/ 1 of 36 Session 2 Module 3: Types of Indexes Module 4: Maintaining Indexes.
Database Fundamental & Design by A.Surasit Samaisut Copyrights : All Rights Reserved.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 7 (Part II) INTRODUCTION TO STRUCTURED QUERY LANGUAGE (SQL) Instructor.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
SQL/Lesson 7/Slide 1 of 32 Implementing Indexes Objectives In this lesson, you will learn to: * Create a clustered index * Create a nonclustered index.
Chapter 4 Indexes. Index Architecture  By default data is inserted on a first-come, first-serve basis  Indexes bring order to this chaos  Once you.
Working with SQL Server Database Objects Faculty: Nguyen Ngoc Tu.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Chapter 5 Index and Clustering
Session 1 Module 1: Introduction to Data Integrity
Creating Indexes on Tables An index provides quick access to data in a table, based on the values in specified columns. A table can have more than one.
IMS 4212: Constraints & Triggers 1 Dr. Lawrence West, Management Dept., University of Central Florida Stored Procedures in SQL Server.
Manipulating Data Lesson 3. Objectives Queries The SELECT query to retrieve or extract data from one table, how to retrieve or extract data by using.
Table Structures and Indexing. The concept of indexing If you were asked to search for the name “Adam Wilbert” in a phonebook, you would go directly to.
Unit-8 Introduction Of MySql. Types of table in PHP MySQL supports various of table types or storage engines to allow you to optimize your database. The.
MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Sravanthi Lakkimsety Mar 14,2016.
Indexes Part 2 What type of Indexes are there? Make sure you have the pages 2 & 3 of the Lab for Indexes in front of you before playing this presentation.
SQL Basics Review Reviewing what we’ve learned so far…….
Module 6: Creating and Maintaining Indexes. Overview Creating Indexes Understanding Index Creation Options Maintaining Indexes Introducing Statistics.
SQL Server Statistics DEMO SQL Server Statistics SREENI JULAKANTI,MCTS.MCITP SQL SERVER Database Administration.
Select Operation Strategies And Indexing (Chapter 8)
Views / Session 3/ 1 of 40 Session 3 Module 5: Implementing Views Module 6: Managing Views.
Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
Microsoft SQL Server 2005 Advanced SQL Programming and Optimization
Indexes By Adrienne Watt.
Indexing Structures for Files and Physical Database Design
SQL Implementation & Administration
Finding more space for your tight environment
Module 4: Creating and Tuning Indexes
Lecture 12 Lecture 12: Indexing.
JULIE McLAIN-HARPER LINKEDIN: JM HARPER
Introduction to Database Systems
Database systems Lecture 6 – Indexes
Database Management System
Physical Storage Structures
IST 318 Database Administration
Manipulating Data Lesson 3.
Presentation transcript:

How SQL Server Indexes Work Sharon F. Dooley

SQL Server Indexes SQL Server indexes are based on B-trees –Special records called nodes that allow keyed access to data –Two kinds of nodes are special Root Leaf Intermediate node Leaf node Data pages Root node AO OT TW EI G CAT ACEGIKMN OQ AI

SQL Server B-Tree Rules Root and intermediate nodes point only to other nodes Only leaf nodes point to data The number of nodes between the root and any leaf is the same for all leaves A node always contains between K and K/2 branches, where K is the branching factor –Branching factor is the number of keys in the node B-trees are always sorted The tree will be maintained during insertion, deletion, and updating so that these rules are met –When records are inserted or updated, nodes may split –When records are deleted, nodes may be c ollapsed

What Is a Node? A page that contains key and pointer pairs Key Pointer

Splitting a B-Tree Node Root (Level 0) Node (Level 1) Leaf (Level 2) AbbyBobCarolDave AbbyAdaAndyAnn AdaAlanAmandaAmy BobAlanAmandaCarolAmyDaveAda DB

Lets Add Alice Step 1: Split the leaf node BobAlanAmandaCarolAmyDaveAdaAlice AdaAlanAlice AmandaAmy

Adding Alice Step 2: Split the next level up DB Leaf AbbyAdaAmanda AndyAnn BobAlanAmandaCarolAmyDaveAdaAlice AdaAlanAliceAmandaAmy

Adding Alice (continued) Split the root DB Leaf AdaAlanAlice BobAlanAmandaCarolAmyDaveAdaAlice AmandaAmy AndyAnn CarolDave AbbyAndyBob AbbyAdaAmanda

Adding Alice (continued) When the root splits, the tree grows another level Root (Level 0) Node (Level 1) Node (Level 2) Leaf (Level 3) DB AbbyCarol AmandaAmy BobAlanAmandaCarolAmyDaveAdaAlice AdaAlanAlice AbbyAndyBob AbbyAdaAmanda CarolDave AndyAnn

Page splits cause fragmentation Two types of fragmentation –Data pages in a clustered table –Index pages in all indexes Fragmentation happens because these pages must be kept in order Data page fragmentation happens when a new record must be added to a page that is full –Consider an Employee table with a clustered index on LastName, FirstName – A new employee, Peter Dent, is hired Extent Adams, Carol Ally, Kent Baccus, Mary David, Sue Dulles, Kelly Edom, Mike Farly, Lee Frank, Joe Ollen, Carol Oppus, Larry...

Data Page Fragmentation Extent Dulles, Kelly Edom, Mike... Adams, Carol Ally, Kent Baccus, Mary David, Sue Dent, Peter Farly, Lee Frank, Joe Ollen, Carol Oppus, Larry...

Index Fragmentation Index page fragmentation occurs when a new key-pointer pair must be added to an index page that is full –Consider an Employee table with a nonclustered index on Social Security Number Employee is added , pointer , pointer , pointer , pointer , pointer , pointer , pointer , pointer , pointer , pointer , pointer Extent

Index Fragmentation (continued) Extent , pointer , pointer , pointer , pointer , pointer , pointer , pointer , pointer , pointer , pointer , pointer , pointer...

Studying Fragmentation in SQL Server 2000 To determine if there is fragmentation –In a clustered table or a nonclustered index DBCC SHOWCONTIG [([table_id | table_name | view_id | view_name [, index_id | index_name])] DBCC SHOWCONTIG scanning 'Employees' table... Table: 'Employees' ( ); index ID: 1, database ID: 7 TABLE level scan performed. - Pages Scanned : Extents Scanned : 90 - Extent Switches : Avg. Pages per Extent : Scan Density [Best Count:Actual Count] : 15.53% [89:573] - Logical Scan Fragmentation : 39.18% - Extent Scan Fragmentation : 58.89% - Avg. Bytes Free per Page : Avg. Page Density (full) : 46.39% DBCC execution completed. If DBCC printed error messages, contact your system administrator.

Studying Fragmentation in SQL Server 2000 (continued) Unless the table or index spans multiple files –Extent Switches and Extents Scanned should be approximately equal Scan Density should be close to 100 percent Avg. Page Density should be high and Avg. Bytes Free Per Page should be low Logical Scan Fragmentation and Extent Scan Fragmentation should be as close to 0 as possible Clearly the Employees table is terribly fragmented!

Studying Fragmentation in SQL Server 2005 and 2008 SELECT object_name(s.object_id), name, avg_fragmentation_in_percent FROM sys.dm_db_index_physical_stats (db_id('databasename'), object_id('tablename'), {indexid | NULL}, {partitionnumber | NULL}, {'LIMITED' | 'SAMPLED' | 'DETAILED' | NULL}) as s INNER JOIN sys.indexes as i ON s.object_id = i.object_id AND s.index_id = i.index_id If NULL is supplied for the last argument, LIMITED is assumed The avg_fragmentation_in_percent should be as close to 0 as possible

Studying Fragmentation in SQL Server 2005 and 2008 (continued) SELECT object_name(s.object_id), name, avg_fragmentation_in_percent FROM sys.dm_db_index_physical_stats (db_id('bigwind'), object_id('Employees'), null, null, null) as s INNER JOIN sys.indexes as i ON s.object_id = i.object_id AND s.index_id = i.index_id WHERE alloc_unit_type_desc = 'IN_ROW_DATA' Results EmployeesLastName_IDx0.685 EmployeesPK_Employees EmployeesCity_IDX0 EmployeesRegion_IDX3.922

Repairing Fragmentation Repair index fragmentation by rebuilding index Rebuilding clustered index repairs table fragmentation DBCC DBREINDEX DBCC DBREINDEX (tablename [, indexname [, fillfactor]]) –Can reorganize indexes that implement primary key and unique constraints CREATE INDEX –DROP_EXISTING causes SQL Server to create and drop the index in a single step Faster than dropping with the DROP INDEX command and then re-creating ALTER TABLE … ADD CONSTRAINT … PRIMARY KEY or UNIQUE

Repairing Fragmentation (continued) DBCC INDEXDEFRAG does not – Lock the index – Do as thorough job as the other methods – Allow specification of a fill factor Uses the fill factor from the last CREATE INDEX for this index DBCC INDEXDEFRAG ( { database_name | database_id | 0 }, { table_name | table_id}, { index_name | index_id } ) ALTER INDEX index_name ON table_name REORGANIZE – Same as DBCC INDEXDEFRAG ALTER INDEX index_name ON table_name REBUILD – Allows concurrent access if you add WITH (ONLINE = ON) to the ALTER INDEX command – Uses the version store in tempdb – Same as DBCC DBREINDEX

SQL Server Indexes SQL Server indexes come in two flavors –Clustered indexes Database rows are in order on the index key The data pages are the leaf nodes of the index –Nonclustered indexes Leaf level is in index order but the data is not Leaf nodes contain pointers to rows One clustered index per table –Choose wisely –Should always have a clustered index Allows reorganization of the data pages 249 nonclustered indexes per table

Clustered Index Database and leaf node Root AbbyBobCarolDave AbbyAdaAndyAnn AdaAlanAmandaAmy

Nonclustered Index Database Root AbbyBobCarolDave AmyAdaAmandaAlan Leaf node AbbyAdaAndyAnn AdaAlanAmandaAmy

Clustered and Nonclustered Indexes Interact Clustered indexes are always unique –If you dont specify unique when creating them, SQL Server may add a uniqueifier to the index key Only used when there actually is a duplicate Adds 4 bytes to the key The clustering key is used in nonclustered indexes –This allows SQL Server to go directly to the record from the nonclustered index –If there is no clustered index, a record identifier will be used instead 1JonesJohn 2SmithMary 3AdamsMark 4DouglasSusan Adams3 Douglas4 Jones1 Smith2 Leaf node of a clustered index on EmployeeID Leaf node of a nonclustered index on LastName

Clustered and Nonclustered Indexes Interact (continued) Another reason to keep the clustering key small! Consider the following query: SELECT LastName, FirstName FROM Employee WHERE LastName = 'Douglas' When SQL Server uses the nonclustered index, it –Traverses the nonclustered index until it finds the desired key –Picks up the associated clustering key –Traverses the clustered index to find the data

Heaps and Chains When you place a clustered index on a table, the pages are chained together in a doubly linked list SQL Server can follow the pointers to move from page to page When there is no clustered index, the table is called a heap –Data is located Through nonclustered indexes By scanning all the pages in the table

Indexes and Inserts When there is no clustered index on a table, SQL Server uses the Page Free Space page to find a page with space for the new record Inserts into tables with a clustered index can cause page splits or hotspots When a particular part of the database is particularly popular, it is called a hotspot Hotspots create contention problems Clustered indexes on keys that arrive in random order cause page splits Clustered indexes on keys that arrive in index order create hotspots –All inserts are again at the end of the table Identities Dates –However, there will be no page splits or collapses

Indexes and Updates When data is modified, indexes may have to be modified as well Changing data in a table with no indexes –Data will be changed in place unless the update means that the row will no longer fit on the page –If the row wont fit, it will simply be moved to a new page Changing a clustering key column –The row will be deleted from its original location –It will be inserted into the new location –All nonclustered indexes must be maintained –Exception: A change that doesnt affect the index order Changing Thompson to Thompsen will be done in place Avoid clustered indexes on columns that frequently change ThompsonTyneThomas

Indexes and Updates (continued) Changing any non-key column in a table with only nonclustered indexes –Data will be changed in place unless the row will no longer fit –If the row will no longer fit It will be moved to another page A forwarding pointer will be left on the original page that points to the new location Index pointers dont need to be updated An additional I/O will now be required There will never be more than one forwarding pointer, no matter how many times a row moves To see whether updates have produced forwarding pointers, use DBCC SHOWCONTIG('tablename') WITH TABLERESULTS SELECT forwarded_record_count FROM sys.dm_db_index_physical_stats (db_id('databasename'), object_id('tablename'), NULL, NULL, NULL)

Indexes and Deletes Deleting from a heap –Row is physically deleted –Remaining rows on page below the deleted record are not moved up at the time the delete happens –Page will be compressed when space is needed for another row on the page Deleting from a leaf node –Data pages of clustered index –Leaf nodes of nonclustered index –Records may not be physically deleted at the time the delete is issued –Records may be marked as ghost records Used by lock manager Not retrievable by users –Special SQL Server process cleans up the ghost records Wont clean up records that are part of an active transaction Doesnt compress page

Indexes and Deletes (continued) Deleting from a nonleaf node –No ghost records –Page is not compressed When rows are deleted, both nonclustered and clustered indexes must be maintained When the last row is deleted from a page (index or data), the page is deallocated and returned to the free space pool –Unless it is the only page in the table –A table always has at least one page, even if it is empty

What Are Index Statistics? Metric used by the optimizer in determining whether or not an index is useful for a particular query Stored in – An image column named statblob in the sysindexes table – An internal and invisible table Essentially a histogram Statistics are kept for 200 steps

Selectivity The statistics allow the optimizer to determine the selectivity of an index –A unique, single-column index always has a selectivity of 1 One index entry points to exactly one row Another term for this is density –Density is the inverse of selectivity Density values range from 0 to 1 A selective index has a density of 0.10 or less A unique, single-column index always has a density of 0.0 When the index is composite, it becomes a little more complicated –SQL Server maintains detailed statistics only on the leftmost column –It does compute density for each column Assume there is an index on (col1, col2, col3) Density is computed for –Col1 –Col1 + Col2 –Col1 + Col2 + Col3

Exploring Statistics To see the index statistics, use – DBCC SHOW_STATISTICS ('tablename', {'indexname' | 'statisticsname'}) – DBCC SHOW_STATISTICS ('Employees', 'EmployeeName_Idx') Interpreting the step output RANGE_HI_KEY Upper-bound value of a histogram step RANGE_ROWS Number of rows from the sample that fall within a histogram step, excluding the upper bound EQ_ROWS Number of rows from the sample that are equal in value to the upper bound of the histogram step DISTINCT_RANGE_ROWS Number of distinct values within a histogram step, excluding the upper bound AVG_RANGE_ROWS Average number of duplicate values within a histogram step, excluding the upper bound

Exploring Index Statistics (continued) Statistics for INDEX 'EmployeeName_IDX'. Updated Rows Rows Sampled Steps Density Avg key length Jan :00PM E All density Average Length Columns E LastName E LastName, FirstName E LastName, FirstName, EmployeeID RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS Aaby Abrahamson Zuran Zvonek Distribution steps The report for SQL Server 2005 and later is slightly different at the beginning

Statistics Maintenance By default, SQL Server will automatically maintain the statistics Index statistics are computed (or recomputed) when the index is created or rebuilt SQL Server keeps track of the updates to a table – Each INSERT, UPDATE, or DELETE statement updates a counter in sysindexes named rowmodctr Note that TRUNCATE TABLE does not modify this counter – Whenever the statistics are recomputed, the counter is set back to zero When you issue a query, the optimizer checks rowmodctr to see whether the statistics are up to date – If they are not, the statistics will be updated

Statistics Maintenance (continued) Note that this may not always happen at the best time in a production system –Can turn off automatic update –Can manually update – In SQL Server 2005 and later, can set the AUTO_UPDATE_STATISTICS_ASYNC database option Example: –Assume that a table has 1,000 rows –The threshold would be (.20 * 1000) –You would expect to see the statistics automatically updated after about 700 modifications Table typeEmpty condition Threshold when emptyThreshold when not empty Permanent< 500 rowsNumber of changes >= 500Number of changes >= (Number of rows * 20%) Temporary< 6 rowsNumber of changes >= 6Number of changes >= 500

Estimating Page Accesses No index – Number of data pages in the table Equality query using a unique index – Nonclustered index Number of index levels + 1 (if there is no clustered index) Number of nonclustered index levels + the number of levels in the clustered index – Clustered index Number of index levels

Estimating Page Accesses for a Clustered Index Number of levels + number of data pages – Number of data pages = number of qualifying rows / rows per page Database and leaf node Root AbbyBobCarolDave AbbyAdaAndyAnn AdaAlanAmandaAmy

Estimating Page Accesses for a Nonclustered Index Number of levels + number of qualifying leaf pages + number of rows or number of levels + number of qualifying leaf pages + (number of rows * number of clustered index levels) – Number of qualifying leaf pages = number of qualifying rows / rows per page – Assumes every row is on a different page Database Root AbbyBobCarolDave AmyAdaAmandaAlan Leaf node AbbyAdaAndyAnn AdaAlanAmandaAmy

Covering Indexes When a nonclustered index includes all the data requested in a query (both the items in the SELECT list and the WHERE clause), it is called a covering index With a covering index, there is no need to access the actual data pages – Only the leaf nodes of the nonclustered index are accessed Because the leaf node of a clustered index is the data itself, a clustered index covers all queries Leaf node of a nonclustered index on LastName, FirstName, Birthdate AdamsMark1/14/19563 DouglasSusan12/12/19474 JonesJohn4/15/19671 SmithMary7/14/19702 The last column is EmployeeID. Remember that the clustering key is always included in a nonclustered index.

Covering Indexes (continued) This index covers the following queries: SELECT EmployeeID, LastName, FirstName, Birthdate FROM Employees WHERE Birthdate >= '1/12/1941' SELECT LastName FROM Employees WHERE EmployeeID = 7 SELECT LastName, FirstName FROM Employees WHERE LastName BETWEEN 'A' AND 'C' Remember that the number of accesses for a nonclustered index is the number of levels + the number of qualifying leaf pages + either the number of rows or (the number of rows * the number of levels in the clustered index) – A covering index eliminates the number of rows term from the equation – The optimizer is highly likely to use a nonclustered index because of this

Non-Key Index Columns SQL Server 2005 and later allow you to include columns in a non-clustered index that are not part of the key – Allows the index to cover more queries – Included columns only appear in the leaf level of the index – Up to 1,023 additional columns – Can include data types that cannot be key columns Except text, ntext, and image data types Syntax CREATE [ UNIQUE ] NONCLUSTERED INDEX index_name ON ( column [ ASC | DESC ] [,...n ] ) [ INCLUDE ( column_name [,...n ] ) ] Example CREATE NONCLUSTERED INDEX NameRegion_IDX ON Employees(LastName) INCLUDE (Region)