APRIL 13 th 2016. Introduction About me Duško Mirković 7 years of experience.

APRIL 13 th 2016

Introduction About me Duško Mirković dusko.mirkovic@schneider-electric-dms.comdusko.mirkovic@schneider-electric-dms.com 7 years of experience with SQL Server Topics Quick recap of basic terminology Key design Index design

Quick Recap – Relational Model Entities modeled as tuples, attributes as tuple elements Tuple element order irrelevant Tuple order irrelevant Relation must have at least one key Relation may have one or more uniqueness constraints

Quick Recap – SQL Server Entities (relations) represented by tables, attributes (relation elements) represented by columns Table can have up to 249+1 (999+1) indexes on SQL Server 2005 (SQL Server 2008) SQL Server supports several different types od indexes B+ tree, Columnstore, Hash, XML, Spatial, Full-Text index ISAM – B+ tree: clustered and nonclustered

Clustered vs. Nonclustered Clustered index leaf nodes contain table data Like the contents of a book Nonclustered index leaf node contain only pointers to heap or clustered index Like a index of tables, figures or keywords at the end of the book

B+ Tree Index Structure Pg. 7 17 – Pg. 5 23 – Pg. 6 Pg. 5 17 – Pg. 1 20 – Pg. 2 Pg. 1 17 __________ 18 __________ 19 __________ Pg. 2 20 __________ 21 __________ 22 __________ Pg. 6 23 – Pg. 3 26 – Pg. 4 Pg. 3 23 __________ 24 __________ 25 __________ Pg. 4 26 __________ 27 __________ 28 __________

How this affects key design? Each key and unique constraint must have backing index in SQL Server Each of them is candidate for clustered index Good clustering key is: unique, static, narrow, fixed width, not nullable, ever increasing

Not so good clustering key is Not unique Introduces overhead for uniquifier values: 4 or 8 bytes per index entry for all nonclustered indexes + counting on insert to determine uniquifier value Not static For each key update reference must be updated in all indexes and row in clustered index may be physically moved to another page

Not so good clustering key is Not narrow Bloating both clustered and nonclustered indexes and increasing I/O operations and memory requirements Not fixed width Introduces overhead on all nonclustered indexes of 2 bytes for end of row + 2 bytes for each variable width column

Not so good clustering key is Nullable Introduces overhead on all nonclustered indexes of minimum 3 bytes for one nullable column Not ever increasing Causes high fragmentation and low page fullness which can severely impact storage requirements, I/O operations and memory requirements

Typical Key Choices Natural keys Simple Always unique, everything else depends on the business rules and chosen data type Composite Similar to simple natural key but rarely narrow (since it contains several columns) SQL Server supports up to 16 columns or 900 bytes for index key columns (whichever is smaller)

Typical Key Choices Surrogate keys UUID/GUID Unique, fixed width and usually static Not so narrow (16 bytes) and not ever-increasing but fairly random NEWSEQUENTIALID corrects this to some degree but has other issues that need to be considered Identity, Sequence, Auto-Increment Unique, static, fixed width, narrow and ever-increasing Seems to be the perfect clustering key, but…

Example – Time Series Data Timeseries (SignalName nvarchar(446), Timestamp datetime2(7), Value real) Primary key and clustered index (SignalName, Timestamp) Data size: 88.65 GB 100,000 distinct signals, average name length 33 chars e.g. com.example.app1.measurement1.xyz 1,000,000,000 total measurements Around 10,000 for each signal

Example – Time Series Data Extracted signal name to separate table (SignalID int identity(1,1), SignalName nvarchar(446)) Primary key and clustered index (SignalID) Alternate key and nonclustered index (SignalName) Data size: 23.70 GB ~ 4 times smaller

Example – Query Performance Trend one measurement for a day (let there be average ~10,000 per day) We execute: SELECT [Timestamp], [Value] FROM Timeseries WHERE SignalID = 70 AND [Timestamp] BETWEEN '2016-01-14' AND '2016-01-15' SQL Server asks for 3+27 pages (8k each) which generates 60 (4k) I/O requests for HDD 10,000 rpm drive could return this in ~ 0.4 seconds

Example – Query Performance Check total measurement rate for one minute We execute: SELECT COUNT(*) FROM Timeseries WHERE [Timestamp] BETWEEN '2016-01-14 00:00:00' AND '2016-01- 14 00:01:00' SQL Server asks for ~3M pages (8k each) which generates twice as much I/O requests for HDD 10,000 rpm drive could return this in ~ 11.5 hours?

Choosing the Correct Index Type B+ tree - ISAM Columnstore (2012/2014) – fact data in data warehouses Full-Text (2005) – character-based columns XML (2005) – xml columns Spatial (2008) – geography/geometry Hash (2014) – memory optimized tables

B+ Tree Index Design Join and filter conditions Columns used in many filers and/or joins are good candidates for index key prefix Column selectivity Highly selective columns are good candidates for index key prefix Comparison operator Favor columns that are compared for equality for index key prefix

Re-evaluate Typical Key Choices Natural keys Simple Often used for join and filtering conditions, highly selective, usually equality operator used Composite Similar to simple, additionally more range scans or prefix (i.e. only part of the key used for lookup) Column order may be subject to natural hierarchy Both simple and composite usually support a lot of queries given that they follow the business logic

Re-evaluate Typical Key Choices Surrogate keys UUID/GUID Rarely used for filtering, almost exclusively used for joins, highly selective and almost always compared with equality operator Identity, Sequence, Auto-Increment Similar to UUID/GUID All surrogate keys almost always require additional indexes since they have little or no connection to the business logic

Example – Time Series Data Which index would be the best for the given time series table a)(SignalID, Timestamp) b)(Timestamp, SignalID) c)Both

Example – Time Series Data If we choose a Queries that are filtering by signal (e.g. trend) will be very efficient Inserts are spread across the whole index → very high fragmentation If we choose b Opposite from a, filtering by measurement will cause table scan but lower fragmentation If we choose c Increase storage space, increase insert time (possibly up to 80%) What if we wanted to aggregate values per hour?

Summary Key is a natural feature of the model (regardless of implementation platform) Index is a data structure supporting the use and enforcement of keys and uniqueness constraints Keys and indexes must be designed (not accidental or auto- generated), since they can greatly affect performance of the system

Summary Good clustering key is: unique, static, narrow, fixed-width, ever increasing Consider extracting complex or large keys in separate table with (integer) surrogate key Analyze queries to determine column order in an index Which columns cover the most (expensive) queries Highly selective columns first Equality columns first Use included columns to prevent bookmark lookups

Where to get more info? Kimberly L. Tripp http://www.sqlskills.com/blogs/kimberly/, @KimberlyLTripphttp://www.sqlskills.com/blogs/kimberly/@KimberlyLTripp Kendra Little http://www.littlekendra.com/, @Kendra_Littlehttp://www.littlekendra.com/@Kendra_Little And many, many more: Paul Randal http://www.sqlskills.com/blogs/paul/, @PaulRandalhttp://www.sqlskills.com/blogs/paul/@PaulRandal Brent Ozar http://www.brentozar.com, @BrentOhttp://www.brentozar.com@BrentO Jonathan Kehayias http://www.sqlskills.com/blogs/jonathan/, @SQLPoolBoyhttp://www.sqlskills.com/blogs/jonathan/@SQLPoolBoy Jeremiah Peschka (previously http://www.brentozar.com), @peschkajhttp://www.brentozar.com@peschkaj Adam Machanic http://sqlblog.com/blogs/adam_machanic/, @AdamMachanichttp://sqlblog.com/blogs/adam_machanic/@AdamMachanic MSDN Books Online SQL CAT Team https://blogs.msdn.microsoft.com/sqlcat/https://blogs.msdn.microsoft.com/sqlcat/ SQL Server Video Archive https://technet.microsoft.com/en-us/dn912438https://technet.microsoft.com/en-us/dn912438

geekstone.org/conf2015 Thank you geekstone.org

APRIL 13 th 2016. Introduction About me Duško Mirković 7 years of experience.

Similar presentations

Presentation on theme: "APRIL 13 th 2016. Introduction About me Duško Mirković 7 years of experience."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

APRIL 13 th 2016. Introduction About me Duško Mirković 7 years of experience.

Similar presentations

Presentation on theme: "APRIL 13 th 2016. Introduction About me Duško Mirković 7 years of experience."— Presentation transcript:

Similar presentations

About project

Feedback