Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Five Ws of Columnstore Indexes

Similar presentations


Presentation on theme: "The Five Ws of Columnstore Indexes"— Presentation transcript:

1 The Five Ws of Columnstore Indexes
Chicago | MAR The Five Ws of Columnstore Indexes

2 Agenda Who? What? Why? How? Where? When? Demos

3 Who? Eureka Dr. Seuss Whoville Whos Deco Trim via Amazon.com

4 Who is this guy? John Eisbrener @johnedba john@dbatlas.com
DBA: Default Blame Assignee DBA for over 10 years MSSQL, Oracle, Greenplum, Postgres Owner/Principal Consultant of a boutique consulting firm, DB Atlas

5 Who are you? DBAs Architects Developers Analysts Management Others

6 What? Knowyourmeme.com

7 What makes Columnstore Indexes Special?
What is an Index? Key differences between Rowstore and Columnstore Indexes Key differences between Clustered and Nonclustered Columnstore Indexes

8 What is an Index? Structure that contains data or pointers to data
Designed to search for data efficiently Designed to perform as a database grows in size The type of index determines how data is stored on disk Highly customizable Columnstore Unofficial Versions SQL 2012 – Alpha SQL 2014 – Beta SQL 2016 – Version 1.0 SQL 2017 – Version 1.1

9 Difference between Rowstore and Columnstore Indexes
Rowstore Index Columnstore Index Row-wise Format Compression is optional Returns all columns defined within the index B+ Trees Column-wise format Compression is required Returns only the columns needed Header and Data

10 Row-wise vs Column-Wise Storage

11 Clustered vs Nonclustered
Clustered Columnstore Index (CCI) Nonclustered Columnstore Index (NCCI) One Per Table Table is Stored in Column- wise format Significant Table Compression Cannot define a filter One Per Table Sits on top of Heap or Clustered (Rowstore) Index Copy of Data; uses more space Can define a filter

12 Why? Youtube.com

13 Why do Columnstore Indexes work so well?
Importance of Compression Brief Overview of Dictionary-based algorithms Column Elimination Rowgroup Elimination

14 Importance of Compression
Reduce Limitations imposed by Data Storage Disk Memory Throughput Proprietary Compression Algorithm Dictionary Based

15 Dictionary-Based Compression
Lossless General Approach Build a Dictionary of Symbols (e.g. words, numbers, etc.) Assign minimal binary codes to each Symbol Smaller binary codes are assigned to more common symbols Replace raw data Symbols with Binary Codes to reduce the size of the data Works best when Symbols are homogenous

16 Dictionary-Based Compression

17 Column Elimination Return only those columns used within the Query
Better compression ratios for data being returned because data is homogenous Column ordering in the (N)CCI Index Definition doesn’t matter, Column Elimination will happen regardless NCCI ordering is defined by the underlying Rowstore Indexes CCI Ascending/Descending order can be implied with how the data is loaded WITH (MAXDOP = 1) Partitioning can also help

18 Rowgroup Elimination Also referred to as Segment Elimination
If the Segment doesn’t contain values identified within the Query Predicate, the entire Rowgroup is eliminated Occurs prior to Column Elimination Not utilized for LOB-based, string-based, or binary datatypes Evaluation of the Segment Header Stores Min/Max of values within Segment

19 Rowgroup Elimination Example

20 How? How it’s Made

21 How do Columnstore Indexes work with changing data?
Rowgroups DeltaStore Inserts Deletes and Updates Tuple Mover ColumnStore Batch Execution Mode

22 Rowgroups Buckets of up to 1 million rows Can be in one of 3 states
Open Closed Compressed Open/Closed are stored in Row-wise format Compressed is stored in Column-wise format

23 DeltaStore

24 Tuple Mover

25 ColumnStore

26 Batch Execution Mode Introduced in SQL 2012 along with Columnstore Indexes Columnstore Index is required on the table Only usable by certain execution plan operators Aggregates/Scans/Hash Matches/Window Aggregates Passes a batch of up to 900 rows between execution plan operators Basically a turbo button for execution plans

27 Where? Lego.com

28 Where can you use Columnstore Indexes?
Datatype Restrictions NCCI Restrictions Optimal Workloads CCIs NCCIs

29 Datatype Restrictions
Will not work with the following datatypes ntext, text, and image nvarchar(max), varchar(max), and varbinary(max) Does not apply to CCIs in SQL Server 2017 only rowversion (and timestamp) sql_variant CLR types (hierarchyid and spatial types) xml

30 NCCI Restrictions Cannot have more than 1024 columns
Cannot be created on a view or indexed view Cannot include a sparse column Cannot be redefined by using the ALTER INDEX statement Use CREATE INDEX WITH (DROP_EXISTING = ON) Cannot include large object (LOB) columns of type nvarchar(max), varchar(max), and varbinary(max)

31 Optimal Workloads - CCIs
Traditional DWH Fact Tables Dimension Tables with over 1 million rows Insert Mostly Workloads History Table of a Temporal Table Logging Tables Updates/Deletes < 10% of all DML Create Nonclustered (Rowstore) Indexes on CCI Improve Query Performance by avoiding Full-Table Scans Large In-Memory OLTP tables

32 Optimal Workloads - NCCIs
OLTP tables with more than 1 million rows Tables that may feed a large number of analytical/aggregate queries Common tables feeding SSRS/Power BI Reports Tables that generate a high amount of Scans Very wide tables that are not easy to create Covering Indexes on Tables that could benefit from being a CCI, but cannot be offline for a long period of time

33 Identify Candidate Tables
Several Scripts have been developed by the community Niko Neugebauer (GitHub Library CISL) Sunil Agarwal (Microsoft Blog Post)

34 When? Apple.com

35 When to use various Columnstore Features?
Compression Delay Filtered NCCIs Maintenance Routines With other features in SQL Server

36 Compression Delay Keyword
Used to delay the Tuple Mover from moving a Closed Rowgroup to a Compressed Rowgroup Max value is 10080, or 7 days Helpful for frequently-updated “hot” data Closed Rowgroups can still be updated/deleted, only when a Rowgroup is compressed is the data immutable Compressing a Closed Rowgroup will require system resources, and you may want these operations to run off-hours

37 Filtered NCCIs Use Compression Delay isn’t long enough
Query Engine will use what it can from Filtered NCCI and pull remaining data from Rowstore Index Must Redefine using CREATE NONCLUSTERED COLUMNSTORE INDEX WITH (DROP_EXISTING=ON) Requires specific SET Options

38 Maintenance Routines Reorganize
Physically removes rows from a rowgroup when 10% or more of the rows have been logically deleted Combines one or more compressed rowgroups to increase rows per rowgroup up to the maximum of 1,024,576 rows Manually Compresses any Closed RowGroups Compresses all Closed AND Open RowGroups when using WITH (COMPRESS_ALL_ROW_GROUPS) hint

39 Maintenance Routines (Continued)
Rebuild Re-compresses all data into the columnstore Historically (e.g and 2012) used to be the only way to reduce fragmentation Locks the table during the rebuild operation SQL 2017 introduces ONLINE rebuilds for NCCIs only Will be used primarily when there is a lot of fragmentation within the Compressed Rowgroups

40 Other features that work well with Columnstore Indexes
Temporal Tables CCI on History Table Availability Groups with Read-Only Replicas Point your reports there! Partitioned Tables

41 Demos OhMaGif.com


Download ppt "The Five Ws of Columnstore Indexes"

Similar presentations


Ads by Google