Download presentation
Presentation is loading. Please wait.
Published byMarin Pinard Modified over 6 years ago
1
TechEd 2013 12/2/2018 7:32 AM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
2
What’s New for Columnstore Indexes and Batch Mode Processing
12/2/2018 7:32 AM DBI-B322 What’s New for Columnstore Indexes and Batch Mode Processing Igor Stanko © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
3
Agenda Trends In Data Warehousing Space How Does Columnstore Work?
What’s New In Columnstore? Demo In Summary 12/2/2018
4
Trends in the Data Warehousing Space Understanding the Opportunity
DW systems continue to grow at a fast pace, scalability is a key concern, growing a system from 10s of TBs, to 100s of TB, to PBs. Performance at scale: ability to analyze massive amounts of data while offering interactive response. Data warehousing for masses: drive down price per TB. Data Warehousing has shifted almost entirely towards the appliance model due to speed of the balanced appliance and scalability of scale out (MPP) solutions. Jim Cobelius, Forrester Research Source: TDWI Report – Next Generation DW Columnstore packaged into an appliance delivers this
5
Agenda Trends In Data Warehousing Space How Does Columnstore Work?
What’s New In Columnstore? Demo In Summary 12/2/2018
6
Columnstore Refresher
how is it different? Data stored as rows Data stored as columns C1 C2 C3 C5 C4 Benefits: Improved compression: Data from same domain compress better Reduced I/O: Fetch only columns needed Improved Performance: More data fits in memory …
7
ColumnStore Terminology
Row Group Column Segment C1 C2 C3 C4 C5 C6 Column Segment contains values from one column for a set of rows Row Group Segments for the same set of rows comprise a row group Segments are compressed Each segment stored in a separate LOB Segment is unit of transfer between disk and memory
8
ColumnStore Index Example
OrderDateKey ProductKey StoreKey RegionKey Quantity SalesAmount 106 01 1 6 30.00 103 04 2 17.00 109 20.00 03 05 3 4 02 5 25.00 102 14.00 10.00
9
1. Horizontally Partition (create Row Groups)
OrderDateKey ProductKey StoreKey RegionKey Quantity SalesAmount 106 01 1 6 30.00 103 04 2 17.00 109 20.00 03 05 3 4 02 5 25.00 ~1M rows OrderDateKey ProductKey StoreKey RegionKey Quantity SalesAmount 102 02 1 14.00 106 03 2 5 25.00 109 01 10.00 04 4 20.00 103 17.00
10
2. Vertically Partition (create Segments)
OrderDateKey ProductKey 106 103 109 StoreKey 01 04 03 05 02 RegionKey 1 2 3 Quantity 6 1 2 4 5 SalesAmount 30.00 17.00 20.00 25.00 OrderDateKey ProductKey 102 106 109 103 StoreKey 02 03 01 04 RegionKey 1 2 Quantity 1 5 4 SalesAmount 14.00 25.00 10.00 20.00 17.00
11
3. Compress Each Segment Some segments will compress more than others
OrderDateKey ProductKey 106 103 109 StoreKey 01 04 03 05 02 RegionKey 1 2 3 Quantity 6 1 2 4 5 SalesAmount 30.00 17.00 20.00 25.00 OrderDateKey RegionKey 1 2 ProductKey 102 106 109 103 Quantity 1 5 4 StoreKey 02 03 01 04 SalesAmount 14.00 25.00 10.00 20.00 17.00 Some segments will compress more than others *Encoding and reordering not shown
12
4. Read The Data Segment Elimination Column Elimination
SELECT ProductKey, SUM (SalesAmount) FROM SalesTable WHERE OrderDateKey < Column Elimination OrderDateKey ProductKey 106 103 109 StoreKey 01 04 03 05 02 RegionKey 1 2 3 Quantity 6 1 2 4 5 SalesAmount 30.00 17.00 20.00 25.00 OrderDateKey RegionKey 1 2 Segment Elimination ProductKey 102 106 109 103 Quantity 1 5 4 StoreKey 02 03 01 04 SalesAmount 14.00 25.00 10.00 20.00 17.00
13
Multi-Row Batch – Batch Processing
Batch object Motivation: Column store significantly reduces i/o required Once i/o is reduced CPU usage becomes major bottleneck Batch processing reduces CPU usage Functionality: Instead of moving rows between iterators, move “set of rows” called batch. Usually ~900 rows at a time. Batches are organized in columnar form with extra vector indicating qualifying rows. Object is moved from iterator to iterator. Number of function calls per row processed drops few orders of magnitude. Many operations can be implemented without data copying, just slight modifications to the batch. Column vectors bitmap of qualifying rows C1 C2 C3 12/2/2018
14
Columnstore Benefits Improved compression: Reduced I/O:
Data from same domain compress better Reduced I/O: Fetch only columns needed Improved Performance: More data fits in memory + batch processing
15
Agenda Trends In Data Warehousing Space How Does Columnstore Work?
What’s New In Columnstore? Demo In Summary 12/2/2018
16
Columnstore.Next - Motivation
SQL Server 2012, columnstore functionality: Non-clustered columnstore indexes. Improved compression, compared to ROW/PAGE compression. Improved query performance Gaps: No DML support, no updates (data refresh) Only secondary, non-clustered, columnstore indexes supported Poor memory management (resource governor was not honored, index build/re-build, run-time) No batch hash join spilling Limited data types support Limited batch operations supported Goals for new columnstore functionality: Competitive load performance and efficient index creation Leading compression ratios and competitive query performance Functional parity with row store 12/2/2018
17
Clustered Columnstore Index
** Space Used = Table space + Index space 91% savings Why is clustered index important? Saves space used Simplifies management – no secondary indexes to maintain Columnstore (and clustered columnstore index) will be PREFERRED storage engine for DW scenarios We encourage users to either move existing tables to CCI, or start using CCI for new tables Additional data types are supported (including high precision decimal, binary, varbinary, etc) 12/2/2018
18
Updatable Columnstore Index
Table consists of column store and row store DML (update, delete, insert) operations leverage delta store INSERT Values Always lands into delta store DELETE Logical operation Data physically remove after REBUILD operation is performed. UPDATE DELETE followed by INSERT. BULK INSERT if batch < 100k, inserts go into delta store, otherwise columnstore SELECT Unifies data from Column and Row stores - internal UNION operation. “Tuple mover” converts data into columnar format once segment is full (1M of rows) REORGANIZE statement forces tuple mover to start. C1 C2 C3 C4 C5 C6 Delta (row) store C1 C2 C3 C4 C5 C6 Column Store tuple mover 12/2/2018
19
Memory Sensitive Columnstore Index
Streaming functionality for columnstore utilities (build, rebuild, load): Columnstore segments are being built in memory. Memory consumption adjusts under memory pressure (e.g. data load, index build/rebuild). Same memory grant and reservation process is being used by different processes (build/rebuild/load). Run-time memory management: Batch mode spilling has been implemented (no need to go back to row mode execution when spilling). Available memory can affect columnstore segment quality Ideal segment size = 1M of rows. Number of segments (columns in the table) drive memory requirements. Product always attempts to create ideal segment by reserving “enough” memory. Under memory pressure, DOP is being reduced first, followed by segment size reduction. In PDW, available memory equates to resource governor settings on compute nodes. 12/2/2018
20
Columnstore That Improves Performance
Batch hash join spilling implemented. Mixed mode (row and batch) query execution presence of row operators does not prevent operators to be executed in the batch mode Additional batch operators: joins (inner, outer) partial Aggregates w/ and w/o group by (local aggregation). Global aggregation not in batch. union all operator Notes: Distinct aggregates and UNION operators continue to be executed in row mode. No changes to PDW query processing. Q tables are still present and they are built using row store. 12/2/2018
21
More Performance Results
12/2/2018
22
Columnstore with Competitive Compression …
Table compression options: DATA_COMPRESSION = { NONE | ROW | PAGE | COLUMNSTORE | COLUMNSTORE_ARCHIVE } COLUMNSTORE Compression Default compression when creating a table with Clustered Columnstore Index Typical customer workloads gets 5-7x compression ratios TPCH 3.1X TPCDS 2.8X Customer 1 3.9X Customer 2 4.3X ** compression measured against raw data file ARCHIVAL Compression Enables additional 30% compression for whole table and/or chosen partitions. Going back and forth between columnstore and columnstore_archive compressions. sys.partitions exposes compression info (3 – columnstore, 4 – columnstore_archive) 12/2/2018
23
12/2/2018 7:32 AM Demo © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
24
Agenda Trends In Data Warehousing Space How Does Columnstore Work?
What’s New In Columnstore? Demo Optimizing database and index design In Summary 12/2/2018
25
Do we need nonclustered column stores?
Yes, if you need constraints or triggers on the table Creating the CCI will fail if there is a B-tree enforcing a key constriant Instead, create table with clustered index and NCCI Won’t be able to update the table No, if constraints aren’t needed Create table and add CCI No other indexes to worry about! Can insert / update / delete in the table Consistent fast query performance Recommended methods for loading into a table with NCCI Disable index, update data, rebuild -or- Use partition switching Use delta table and UNION ALL
26
Other Indexes And CCI Partitioning
There won’t be other indexes needed with a CCI Save space and maintenance work There really isn’t much need for other indexes with NCCI, either Maybe the clustered index Partitioning Partitioning works with both CCI and NCCI Good for managing the lifecycle of data Aging off old data Especially for NCCI, where deletes aren’t possible Consider COLUMNSTO_ARCHIVAL option, if disk space is critical
27
Design out strings from columnstores
Joining on string columns is slow Factor strings out to dimensions It’s generally good DW design practice anyway Dimension and Fact tables Date LicenseId Measure 1 100 2 200 Date LicenseNum Measure XYZ123 100 ABC777 200 LicenseId LicenseNum 1 XYZ123 2 ABC777
28
Making the move to CCIs For existing tables
Drop indexes & constraints Create clustered columnstore index Best done when users aren’t querying If you run a 24/7 operation, and can’t manage a window for update Create a view over the fact table, redirect to existing table Create new table as clustered columnstore index Copy all data to new table When new is table up-to-date with all recent additions… Change the view to redirect to new table
29
Evaluate this session Scan this QR code to evaluate this session.
12/2/2018 7:32 AM Required Slide *delete this box when your slide is finalized Your MS Tag will be inserted here during the final scrub. Evaluate this session Scan this QR code to evaluate this session. © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
30
12/2/2018 7:32 AM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION. © 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
31
Windows 2012 Storage Spaces
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.