The Baker’s Dozen Business Intelligence 13 Tips for the SQL Server Columnstore Index Kevin S. Goff Microsoft SQL Server MVP.

Slides:

Advertisements

Similar presentations

Yukon – What is New Rajesh Gala. Yukon – What is new.NET Framework Programming Data Types Exception Handling Batches Databases Database Engine Administration.

Advertisements

Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,

Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.

Adam Jorgensen Pragmatic Works Performance Optimization in SQL Server Analysis Services 2008.

Data Management and Index Options for SQL Server Data Warehouses Atlanta MDF.

BY LECTURER/ AISHA DAWOOD DW Lab # 2. LAB EXERCISE #1 Oracle Data Warehousing Goal: Develop an application to implement defining subject area, design.

SQL SERVER 2012 XVELOCITY COLUMNSTORE INDEX Conor Cunningham Principal Architect SQL Server Engine.

Big Data Working with Terabytes in SQL Server Andrew Novick

Presented by Brad Gall Using BI Techniques for Database Statistics.

1. Aim High with Oracle Real World Performance Andrew Holdsworth Director Real World Performance Group Server Technologies.

Dos and don’ts of Columnstore indexes The basis of xVelocity in-memory technology What’s it all about The compression methods (RLE / Dictionary encoding)

Working with SQL Server Database Objects

Presented by Marie-Gisele Assigue Hon Shea Thursday, March 31 st 2011.

Advanced Querying OLAP Part 2. Context OLAP systems for supporting decision making. Components: –Dimensions with hierarchies, –Measures, –Aggregation.

Database Systems: Design, Implementation, and Management Eighth Edition Chapter 11 Database Performance Tuning and Query Optimization.

Architecting a Large-Scale Data Warehouse with SQL Server 2005 Mark Morton Senior Technical Consultant IT Training Solutions DAT313.

CHAPTER 11 Large Objects. Need for Large Objects Data type to store objects that contain large amount of text, log, image, video, or audio data. Most.

IST722 Data Warehousing Business Intelligence Development with SQL Server Analysis Services and Excel 2013 Michael A. Fudge, Jr.

Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.

TM 7-1 Copyright © 1999 Addison Wesley Longman, Inc. Physical Database Design.

Module 7 Reading SQL Server® 2008 R2 Execution Plans.

Architecture Rajesh. Components of Database Engine.

Ashwani Roy Understanding Graphical Execution Plans Level 200.

Module 5 Planning for SQL Server® 2008 R2 Indexing.

Applications hitting a wall today with SQL Server Locking/Latching Scale-up Throughput or latency SLA Applications which do not use SQL Server.

Data Warehouse Design Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.

Indexes / Session 2/ 1 of 36 Session 2 Module 3: Types of Indexes Module 4: Maintaining Indexes.

SQL/Lesson 7/Slide 1 of 32 Implementing Indexes Objectives In this lesson, you will learn to: * Create a clustered index * Create a nonclustered index.

Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.

Chapter 5 Index and Clustering

Session 1 Module 1: Introduction to Data Integrity

INTRODUCING SQL SERVER 2012 COLUMNSTORE INDEXES Exploring and Managing SQL Server 2012 Database Engine Improvements.

2012 © Trivadis BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN Welcome November 2012 Columnstore Indexes.

Last Updated : 27 th April 2004 Center of Excellence Data Warehousing Group Teradata Performance Optimization.

IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida Physical Database Implementation—Topics.

Session id: Darrell Hilliard Senior Delivery Manager Oracle University Oracle Corporation.

Table Structures and Indexing. The concept of indexing If you were asked to search for the name “Adam Wilbert” in a phonebook, you would go directly to.

1 Indexes ► Sort data logically to improve the speed of searching and sorting operations. ► Provide rapid retrieval of specified rows from the table without.

The Baker’s Dozen Business Intelligence 13 Productivity Tips for In-Memory OLTP Enhancements in SQL 2016 Kevin S. Goff Microsoft SQL.

BISM Introduction Marco Russo

October 15-18, 2013 Charlotte, NC Accelerating Database Performance Using Compression Joseph D’Antoni, Solutions Architect Anexinet.

--A Gem of SQL Server 2012, particularly for Data Warehousing-- Present By Steven Wang.

SQLUG.be Case study: Redesign CDR archiving on SQL Server 2012 By Ludo Bernaerts April 16,2012.

Power BI Presentation Content Kevin S. Goff Microsoft SQL Server MVP.

APRIL 13 th Introduction About me Duško Mirković 7 years of experience.

Scott Fallen Sales Engineer, SQL Sentry Blog: scottfallen.blogspot.com.

Execution Plans Detail From Zero to Hero İsmail Adar.

Doing fast! Optimizing Query performance with ColumnStore Indexes in SQL Server 2012 Margarita Naumova | SQL Master Academy.

Best Practices for Columnstore Indexes Warner Chaves SQL MCM / MVP SQLTurbo.com Pythian.com.

A Lap Around Columstore Martin Catherall SQL Saturday #464, Melbourne 20 th February 2016.

Session Name Pelin ATICI SQL Premier Field Engineer.

Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.

Introducing Hekaton The next step in SQL Server OLTP performance Mladen Prajdić

SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.

In-Memory Capabilities

UFC #1433 In-Memory tables 2014 vs 2016

Database Performance Tuning and Query Optimization

Four Rules For Columnstore Query Performance

Blazing-Fast Performance:

SQL 2014 In-Memory OLTP What, Why, and How

Introduction to columnstore indexes

Enhance BI Applications and Simplify Development

Steve Hood SimpleSQLServer.com

Microsoft Power BI for Office 365

The Five Ws of Columnstore Indexes

Sunil Agarwal | Principal Program Manager

Four Rules For Columnstore Query Performance

Clustered Columnstore Indexes (SQL Server 2014)

Chapter 11 Database Performance Tuning and Query Optimization

Presentation transcript:

The Baker’s Dozen Business Intelligence 13 Tips for the SQL Server Columnstore Index Kevin S. Goff Microsoft SQL Server MVP

Kevin S. Goff – Brief BIO Developer/architect since 1987 / Microsoft SQL Server MVP Columnist for CoDe Magazine since 2004, “The Baker’s Dozen” Productivity Series”, 13 tips on a SQL/BI topic Wrote a book, collaborated on a 2 nd book Frequent speaker for SQL Server community events and SQL Live!360 Conferences My site/blog: (includes SQL/BI webcasts) Releasing some SQL/BI video courseware in Intro to Power BI for Office 365

Columnstore Index - Introduction Today: 13 topics for the Columnstore index New index in SQL Server 2012, enhanced in SQL 2014 More than just an index, an in-memory compressed structure A real game-changer, one of the biggest features in the SQL database engine of all time Some companies upgraded to SQL 2012 just because of this feature Represents another example where MS is devoting serious attention to the underlying database engine –Earlier versions of SQL Server (2005) focused largely on language and developer enhancements –SQL 2008 and 2012 have seen underlying database management/engine changes (Change Data Capture and Columnstore Index)

Columnstore Index - Introduction In SQL 2012, not everyone benefits from this Built for more Data warehouse/data mart environments, and even then, only certain ones (in 2012 it’s a READONLY index, but that changes in 2014) For Data Warehousing environments, the columnstore index is one more reason why Data Warehouses/Data Marts should shape data in star-schema Fact/Dimension Models with surrogate integer keys

1.Quick demonstration 2.Overview to the Columnstore Index 3.Characteristics of the Columnstore Index 4.Who benefits from this? 5.Columnstore indexes vs Rowstore Index 6.Execution plan using the Columnstore index 7.Batch Mode Processing – new processing mode for the Columnstore index 8.Where the Columnstore index can’t directly be used 9.Selective vs non-Selective queries – where Columnstore index isn’t used 10.General Usage rules 11.Restriction rules on the Columnstore index 12.Overall Performance Benchmarks 13.New features in SQL Server 2014 Columnstore Index - Topics

1 – Quick Demonstration Demo code…

New relational, xVelocity memory-optimized database index in SQL Server 2012, “baked in” to the database engine xVelocity used to be called VertiPaq, found in PowerPivot going back to 2010 More and more functionality in DB engine (xVelocity, CDC) Potentially Significant performance enhancements for data warehousing and data mart scenarios – a real game changer – (not really for OLTP databases, we’ll see why later) Best for queries that scan/aggregate large sets of data My opinion? One of the coolest things ever in SQL Server In a regular index, indexed data from each row kept together on single page – and the data in each column spread across all pages of index In a columnstore index, data from each column is kept together (pages stored adjacently) so each data page contains data only from a single column (compressed, more fits in memory, more efficient IO) 2 – Introduction to Columnstore Index

Highly compressed - Exploits similarity of data within column –Typical in data warehouse Fact Table foreign keys IO Statistics - dramatically reduces # of logical reads!!! Not stored in standard buffer pools, but rather in a new optimized buffer pool cache and a new memory broker Smart IO and caching using aggressive read-ahead read strategy Part of Microsoft’s xVelocity technology – compression is factor of 8 (and twice as efficient as page compression) Once posted, only READONLY (this changes in SQL 2014) Best for data warehouse/mart queries that scan/aggregate large amounts of data–might lower need for OLAP aggregation Some queries might run at least 10x faster (or more) 3 – Characteristics of Columnstore Index

Queries and reports against Data Warehouses/Data Marts (works best with Fact/Dimension tables modeled in a star schema) Load from Data Warehouses/Marts into OLAP Cubes (more so in SQL 2014) SSAS OLAP Databases that use the ROLAP methodology or pass-through mode “might” benefit (more so in SQL 2014) New Analysis Services Tabular Model uses xVelocity engine Some companies took the release candidate and put into production, simply for this feature (some case studies show queries that went from 17 minutes to 3 seconds!) If you want to see Memory Usage, good blog entry from Joe D’Antoni (website is SQL Herald, He and other developers wrote procedure to return amount of memory used by columnstore object poolhttp://joeydantoni.com/ – columnstore-index-in-sql-server/ columnstore-index-in-sql-server/ 4 – Who Benefits?

Columnstore index stores each column in separate set of pages (vs. storing multiple data rows per page using b-trees, key values) Only columns needed are fetched Easier to compress redundant column data Uses xVelocity found in PowerPivot Improved IO scan/buffer hit rates Segment elimination: each partition is broken into million row segments with metadata for min/max values – segment is not read if query scope does not include min/max values Query will only fetch necessary columns In reality, not “really” an index – more like a compressed “cube” 5 – Columnstore vs Rowstore Index

Because a Fact table might contain millions of rows for a single CustomerFK or ProductFK, SQL Server can compress all the repeated surrogate keys to a single value Under the hood, SQL Server is not storing the values of 2, 3, etc….it is storing a special vector - an offset value with respect to the prior value (for efficiency) SQL Server also uses segment elimination for rows not needed – so any query for year of 2011 can eliminate the segments for 2010 and 2012 Bottom line: ALL SORTS of efficiency baked into the engine – but there’s even more! This is one more reason to shape data warehouses/marts into star-schema, Fact- Dimension models with surrogate keys 5 – Columnstore vs Rowstore Index CustomerFK ProductFK DateFK Stored as a vector (value that determines position of one point in space relative to another)

Go back to Execution Plan Columnstore index was 5% of the batch Clustered index was 65% of the batch Covering index (which would have been the best approach prior to SQL Server 2012) was 35% of the batch Time Statistics, 12x faster than covering index, 20x faster than clustered index 6 – Execution Plan with Columnstore

Go back to Execution Plan New Processing Model in SQL 2012 Certain execution plan operators (Hash Join and Hash Aggregation in particular) use new Batch execution mode Reads rows in blocks of 1,000 in parallel, minimizes instructions per row Data moves in batches through query plan operators Big performance benefit over row-based execution 7 – Batch Mode Processing Packets of about 1,000 rows are passed between operators, with column data represented as a vector “Vector-oriented processing” Huge reduction in CPU Usage, at least by a factor of 10 or more Batch mode takes advantage of advanced hardware architectures, processor cache, and RAM, improves parallelism

Demo code… Issue w/OUTER JOIN: can’t use directly against table Will “work”, but will use slower row execution mode Must pre-aggregate separately and then do OUTER JOIN (will use batch mode) 8 – Where Columnstore can’t be used

9 – Selective vs Non-Selective queries Demo code…

Syntax is simple: use new COLUMNSTORE keyword 1 Columnstore index per table: cannot be clustered (in 2014 can be clustered) Order of columns does not matter Include all columns from table No INCLUDE statement, No ASC/DESC General MS recommendation: if queries will frequently use a certainly column on the predicate, create a clustered index on that column and then create the columnstore index. –Even though column store index isn’t “ordered” itself, you’ll get better segment elimination CREATE NONCLUSTERED COLUMNSTORE INDEX [IX_BPO_ColumnStore] ON [BigPurchaseOrderHeader] (PurchaseOrderID, VendorID, OrderDate, ShipMethodID, Freight, TotalDue) 10 – General Usage Syntax and Rules Include all columns Order doesn’t matter No key columns

Cannot be clustered, cannot be created against a view Cannot act as a PK or FK, cannot include sparse columns Can’t work on tables with Change Data Capture/Change Tracking or FileStream, can’t participate in replication, nor when page/row compression exists Cannot be used with certain data types, such as binary, text/image, rowversion/timestamp, CLR data types (hierarchyID/spatial), nor with data types created with MAX keyword…e.g. varchar(max) Cannot be used with UniqueIdentifier Cannot be used with decimal > 18 Cannot be modified with ALTER – must be dropped and recreated It’s a read-only index - cannot insert rows and expect columnstore index to be maintained (changes in SQL 2014) 11 – Restrictions and Rules

Note: range partitioning is supported….(use partitioning to load a table, index it with a columnstore index, and switch it in as newest partition.) –Partition by day, split the last partition –Load data into staging table and then create columnstore index –Switch it in (URL reference at end of slides for an example) –SQL Server 2012 permits 15,000 partitions per table Not optimized for certain statements (OUTER JOIN, UNION, NOT IN ) Not optimized for certain scenarios (high selectivity, queries lacking any aggregations) Not optimized for a JOIN statement on a composite set of columns (truthfully, a join between a fact table and a dimension table should only be on one integer key) Best practice – always use integer keys for FKs 11 – Restrictions and Rules (continued)

IndexCPU time (ms) Total Time (ms) Logical ReadsRead-ahead Reads Clustered index4,3373,89927,6310 Non-clustered covering index2,2462,39321,3348 Column Store index ,18012, – General Benchmarks

New Clustered Columnstore Index (CCI) CREATE CLUSTERED COLUMNSTORE INDEX [IndexName] on [TableName] No columns specified - The CCI “is” the data –It’s Updateable, No longer a read-only index –Cannot have any non-clustered indexes –Cannot have key constraints So you have one of two options: –A non-clustered read-only columnstore index, index plus as many non-clustered indexes for FK values as you need (2012 model) –A clustered read-write columnstore index, but no non-clustered indexes for specific FK values (2014 model) 2014 CCI Index might perform better against highly selective queries than 2012 columnstore indexes did on highly selective queries Support for more data types Basically all data types except CLR, varchar(max) and varbinary(max), XML, and spatial data types Additional Archive compression on top of regular columnstore compression Some ask – what is difference between this and Hekaton in-memory optimized tables? 13 – Enhancements in SQL 2014

A great video on Columnstore Index from Tech-Ed 2013: – Adding data to a table using Partition Switching – columnstore-index-using-partition-switching.aspxhttp://social.technet.microsoft.com/wiki/contents/articles/5069.add-data-to-a-table-with-a- columnstore-index-using-partition-switching.aspx Last year I did a 13-part series on new features in SQL 2012 for TechNet: – webcast-recordings-mvp-kevin-goff.aspxhttp://blogs.technet.com/b/jweston/archive/2012/03/28/sql-2012-free-training-one-blog-post-13- webcast-recordings-mvp-kevin-goff.aspx I’ve written some articles in CoDe Magazine on SQL 2012CoDe Magazine –2 part series on Columnstore index, T-SQL Features, and SSIS Features Recommended Links