APRIL 13 th 2016. Introduction About me Duško Mirković 7 years of experience.

Slides:



Advertisements
Similar presentations
Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
Advertisements

Understanding SQL Server Query Execution Plans
Index Dennis Shasha and Philippe Bonnet, 2013.
Hashing and Indexing John Ortiz.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Advanced Databases: Lecture 2 Query Optimization (I) 1 Query Optimization (introduction to query processing) Advanced Databases By Dr. Akhtar Ali.
Project Management Database and SQL Server Katmai New Features Qingsong Yao
Module 6 Implementing Table Structures in SQL Server ®2008 R2.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Query Evaluation. SQL to ERA SQL queries are translated into extended relational algebra. Query evaluation plans are represented as trees of relational.
1 Overview of Storage and Indexing Chapter 8 (part 1)
Chapter 8 File organization and Indices.
1 File Organizations and Indexing Module 4, Lecture 2 “How index-learning turns no student pale Yet holds the eel of science by the tail.” -- Alexander.
Physical Database Design CIT alternate keys - named constraints - indexes.
1 Overview of Storage and Indexing Yanlei Diao UMass Amherst Feb 13, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
File Organizations and Indexing Lecture 4 R&G Chapter 8 "If you don't find it in the index, look very carefully through the entire catalogue." -- Sears,
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
DBMS Internals: Storage February 27th, Representing Data Elements Relational database elements: A tuple is represented as a record CREATE TABLE.
Architecting a Large-Scale Data Warehouse with SQL Server 2005 Mark Morton Senior Technical Consultant IT Training Solutions DAT313.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 8.
Chapter 61 Chapter 6 Index Structures for Files. Chapter 62 Indexes Indexes are additional auxiliary access structures with typically provide either faster.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8.
Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.
1 Physical Data Organization and Indexing Lecture 14.
Database Management 9. course. Execution of queries.
1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
Module 5 Planning for SQL Server® 2008 R2 Indexing.
Denny Cherry twitter.com/mrdenny.
Storage and Indexing1 Overview of Storage and Indexing.
Implementation of Relational Operators/Estimated Cost 1.Select 2.Join.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8.
1 Overview of Storage and Indexing Chapter 8. 2 Data on External Storage  Disks: Can retrieve random page at fixed cost  But reading several consecutive.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8 “How index-learning turns no student pale Yet.
T-SQL: Simple Changes That Go a Long Way DAVE ingeniousSQL.com linkedin.com/in/ingenioussql.
Indexes and Views Unit 7.
SQL/Lesson 7/Slide 1 of 32 Implementing Indexes Objectives In this lesson, you will learn to: * Create a clustered index * Create a nonclustered index.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Relational Operator Evaluation. Overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g.,
Chapter 4 Logical & Physical Database Design
Chapter 5 Index and Clustering
Session 1 Module 1: Introduction to Data Integrity
File Organizations and Indexing
Indexing Fundamentals Steve Hood SimpleSQLServer.com.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8.
SQL SERVER DAYS 2011 Indexing Internals Denny Cherry twitter.com/mrdenny.
CS4432: Database Systems II
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2007.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 8 Jianping Fan Dept of Computer Science UNC-Charlotte.
Execution Plans Detail From Zero to Hero İsmail Adar.
Diving into Query Execution Plans ED POLLACK AUTOTASK CORPORATION DATABASE OPTIMIZATION ENGINEER.
CENG 351 File Structures and Data Management1 Relational Model Chapter 3.
1 Overview of Storage and Indexing Chapter 8. 2 Review: Architecture of a DBMS  A typical DBMS has a layered architecture.  The figure does not show.
Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
CS222: Principles of Data Management Lecture #4 Catalogs, Buffer Manager, File Organizations Instructor: Chen Li.
CPS216: Data-intensive Computing Systems
Indexing Structures for Files and Physical Database Design
Record Storage, File Organization, and Indexes
CS222P: Principles of Data Management Notes #6 Index Overview and ISAM Tree Index Instructor: Chen Li.
Lecture 12 Lecture 12: Indexing.
Execution Plans Demystified
Midterm Review – Part I ( Disk, Buffer and Index )
SQL Server Query Plans Journeyman and Beyond
CS222/CS122C: Principles of Data Management Notes #6 Index Overview and ISAM Tree Index Instructor: Chen Li.
The Ins and Outs of Indexes
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #05 Index Overview and ISAM Tree Index Instructor: Chen Li.
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #03 Row/Column Stores, Heap Files, Buffer Manager, Catalogs Instructor: Chen Li.
All about Indexes Gail Shaw.
The Ins and Outs of Indexes
Presentation transcript:

APRIL 13 th 2016

Introduction About me Duško Mirković 7 years of experience with SQL Server Topics Quick recap of basic terminology Key design Index design

Quick Recap – Relational Model Entities modeled as tuples, attributes as tuple elements Tuple element order irrelevant Tuple order irrelevant Relation must have at least one key Relation may have one or more uniqueness constraints

Quick Recap – SQL Server Entities (relations) represented by tables, attributes (relation elements) represented by columns Table can have up to (999+1) indexes on SQL Server 2005 (SQL Server 2008) SQL Server supports several different types od indexes B+ tree, Columnstore, Hash, XML, Spatial, Full-Text index ISAM – B+ tree: clustered and nonclustered

Clustered vs. Nonclustered Clustered index leaf nodes contain table data Like the contents of a book Nonclustered index leaf node contain only pointers to heap or clustered index Like a index of tables, figures or keywords at the end of the book

B+ Tree Index Structure Pg – Pg – Pg. 6 Pg – Pg – Pg. 2 Pg __________ 18 __________ 19 __________ Pg __________ 21 __________ 22 __________ Pg – Pg – Pg. 4 Pg __________ 24 __________ 25 __________ Pg __________ 27 __________ 28 __________

How this affects key design? Each key and unique constraint must have backing index in SQL Server Each of them is candidate for clustered index Good clustering key is: unique, static, narrow, fixed width, not nullable, ever increasing

Not so good clustering key is Not unique Introduces overhead for uniquifier values: 4 or 8 bytes per index entry for all nonclustered indexes + counting on insert to determine uniquifier value Not static For each key update reference must be updated in all indexes and row in clustered index may be physically moved to another page

Not so good clustering key is Not narrow Bloating both clustered and nonclustered indexes and increasing I/O operations and memory requirements Not fixed width Introduces overhead on all nonclustered indexes of 2 bytes for end of row + 2 bytes for each variable width column

Not so good clustering key is Nullable Introduces overhead on all nonclustered indexes of minimum 3 bytes for one nullable column Not ever increasing Causes high fragmentation and low page fullness which can severely impact storage requirements, I/O operations and memory requirements

Typical Key Choices Natural keys Simple Always unique, everything else depends on the business rules and chosen data type Composite Similar to simple natural key but rarely narrow (since it contains several columns) SQL Server supports up to 16 columns or 900 bytes for index key columns (whichever is smaller)

Typical Key Choices Surrogate keys UUID/GUID Unique, fixed width and usually static Not so narrow (16 bytes) and not ever-increasing but fairly random NEWSEQUENTIALID corrects this to some degree but has other issues that need to be considered Identity, Sequence, Auto-Increment Unique, static, fixed width, narrow and ever-increasing Seems to be the perfect clustering key, but…

Example – Time Series Data Timeseries (SignalName nvarchar(446), Timestamp datetime2(7), Value real) Primary key and clustered index (SignalName, Timestamp) Data size: GB 100,000 distinct signals, average name length 33 chars e.g. com.example.app1.measurement1.xyz 1,000,000,000 total measurements Around 10,000 for each signal

Example – Time Series Data Extracted signal name to separate table (SignalID int identity(1,1), SignalName nvarchar(446)) Primary key and clustered index (SignalID) Alternate key and nonclustered index (SignalName) Data size: GB ~ 4 times smaller

Example – Query Performance Trend one measurement for a day (let there be average ~10,000 per day) We execute: SELECT [Timestamp], [Value] FROM Timeseries WHERE SignalID = 70 AND [Timestamp] BETWEEN ' ' AND ' ' SQL Server asks for 3+27 pages (8k each) which generates 60 (4k) I/O requests for HDD 10,000 rpm drive could return this in ~ 0.4 seconds

Example – Query Performance Check total measurement rate for one minute We execute: SELECT COUNT(*) FROM Timeseries WHERE [Timestamp] BETWEEN ' :00:00' AND ' :01:00' SQL Server asks for ~3M pages (8k each) which generates twice as much I/O requests for HDD 10,000 rpm drive could return this in ~ 11.5 hours?

Choosing the Correct Index Type B+ tree - ISAM Columnstore (2012/2014) – fact data in data warehouses Full-Text (2005) – character-based columns XML (2005) – xml columns Spatial (2008) – geography/geometry Hash (2014) – memory optimized tables

B+ Tree Index Design Join and filter conditions Columns used in many filers and/or joins are good candidates for index key prefix Column selectivity Highly selective columns are good candidates for index key prefix Comparison operator Favor columns that are compared for equality for index key prefix

Re-evaluate Typical Key Choices Natural keys Simple Often used for join and filtering conditions, highly selective, usually equality operator used Composite Similar to simple, additionally more range scans or prefix (i.e. only part of the key used for lookup) Column order may be subject to natural hierarchy Both simple and composite usually support a lot of queries given that they follow the business logic

Re-evaluate Typical Key Choices Surrogate keys UUID/GUID Rarely used for filtering, almost exclusively used for joins, highly selective and almost always compared with equality operator Identity, Sequence, Auto-Increment Similar to UUID/GUID All surrogate keys almost always require additional indexes since they have little or no connection to the business logic

Example – Time Series Data Which index would be the best for the given time series table a)(SignalID, Timestamp) b)(Timestamp, SignalID) c)Both

Example – Time Series Data If we choose a Queries that are filtering by signal (e.g. trend) will be very efficient Inserts are spread across the whole index → very high fragmentation If we choose b Opposite from a, filtering by measurement will cause table scan but lower fragmentation If we choose c Increase storage space, increase insert time (possibly up to 80%) What if we wanted to aggregate values per hour?

Summary Key is a natural feature of the model (regardless of implementation platform) Index is a data structure supporting the use and enforcement of keys and uniqueness constraints Keys and indexes must be designed (not accidental or auto- generated), since they can greatly affect performance of the system

Summary Good clustering key is: unique, static, narrow, fixed-width, ever increasing Consider extracting complex or large keys in separate table with (integer) surrogate key Analyze queries to determine column order in an index Which columns cover the most (expensive) queries Highly selective columns first Equality columns first Use included columns to prevent bookmark lookups

Where to get more info? Kimberly L. Tripp Kendra Little And many, many more: Paul Randal Brent Ozar Jonathan Kehayias Jeremiah Peschka (previously Adam Machanic MSDN Books Online SQL CAT Team SQL Server Video Archive

geekstone.org/conf2015 Thank you geekstone.org