October 15-18, 2013 Charlotte, NC Accelerating Database Performance Using Compression Joseph D’Antoni, Solutions Architect Anexinet
October 15-18, 2013 | Charlotte, NC Please silence cell phones
About – Twitter Joedantoni.wordpress.com -- Blog -- Bit.ly/SQLCompression – Presentation Resources
Overview Compression—Tell me More? Deduplication—What is That? What Data Should I Compress? Columnstore—How Is It Different?
Compression the Early Years
Why Compression Now?
What is (Data) Compression?.jpg.mp4
Deduplication Specialized compression to eliminate duplicate copies of repeating data Real Example—In VMWare, you may have 10 copies of Windows running on same physical machine. Memory blocks (Common.dlls for example) may be deduplicated. Exchange Attachments Backup Appliances
Compression Benefits
So What are the Benefits of Compression? Faster performance on selects Less I/O is required to return data Better Space Utilization on Disk More Rows In Memory
Expenses of Compression
Moderately Slower Single Row Updates/Inserts More expensive (slower) bulk updates and inserts
SQL Server Compression Types Row Compression Page Compression Prefix Compression Dictionary Compression Backup Compression
Row Compression Treats fixed length datatypes like variable datatypes
Page Compression Page Before Compression Prefix Compression Dictionary Compression Images Courtesy SQL Server Books Online
Partitioning and Compression
Backup Compression In all editions of SQL Server, starting with 2008 R2 Always use Backup Compression (even when your storage team says no) Space is by default pre-allocated for estimated size of uncompressed backup Trace Flag 3042
How Does Compression Work? Storage Engine compresses and decompresses data No other parts of SQL Server need to understand compression Application code doesn’t need to change
So What Objects Should We Compress? SQL Data Compression gives a great deal of flexibility Can compress tables, indexes and/or partition(s) Can use different methods of compression for each How to Decide?
October 15-18, 2013 | Charlotte, NC Deciding What to Compress 1.Start with Space Savings 2.Check Update Percentage 3.Check Scan Percentage 4.Map it out and decide
Space Savings sp_estimate_data_compression_savings What Won’t Compress Well Columns with numeric or fixed-length character data types where most values require all the bytes allocated for the specific data type Not much repeating data Repeating data with non-repeating prefixes Data stored out of the row FILESTREAM data
Application Workloads Microsoft Recommendation: Page Compression has higher overhead than Row Compression Evaluate where to use page compression carefully If row compression results in space savings and the system can accommodate a 10 percent increase in CPU usage, all data should be row-compressed.
Example - source Microsoft Compression White Paper Table Name Savings ROW % Savings PAGE % ScansUpdatesDecisionNotes Table 1 Employees 80%90%3.80%57.27%ROW Low S, very high U. ROW savings close to PAGE Table 2 HR 15%89%92.46%0%PAGE Very high S Table 3 Salary 30%81%27.14%4.17%ROW Low S Table 4 Vendors 38%83%89.16%10.54%ROW High U Table 5 Sales Order 21%87%0.00%0%PAGE Append ONLY table Table 1: Deciding what to compress
What Happens When We Compress and Object? Tables and Indexes are rebuilt using ALTER TABLE…REBUILD and ALTER INDEX..REBUILD Requires workspace, CPU and I/O Same mechanism as rebuilding an index Free workspace required in User Database Transaction Log Temp DB
How and When Compress Data Online vs Offline Concurrent vs Serial Order of Compressing—start small and work up SORT_IN_TEMPDB
How SQL Manages Inserts and Updates with Compresison Table organization Table compression setting ROW CompressionPAGE Compression HeapThe newly inserted row is row-compressed. The newly inserted row is page-compressed: · if new row goes to an existing page with page compression · if the new row is inserted through BULK INSERT with TABLOCK · if the new row is inserted through INSERT INTO... (TABLOCK) SELECT... FROM Otherwise, the row is row- compressed.* Clustered indexThe newly inserted row is row-compressed. The newly inserted row is page-compressed if new row goes to an existing page with page compression Otherwise, it is row compressed until the page fills up. Page compression is attempted before a page split.**
What Happens to SQL Server’s Underlying Data Structures? Table compression Transaction log Mapping index for rebuilding the clustered index Sort pages for queries Version store (with SI or RCSI isolation level) ROW NONE ROW PAGEROWNONE ROW
What is Columnstore?
ColumnStore Architecture Column Segments Contains records from one for multiple rows Row Groups Segments that contain the same set of rows make a row group Segments are compressed Each segment is stored as its own LOB. Segment is unit of movement from disk into memory Row Group Column Segment
ColumnStore Limitations Non-Updateable (2012) Limited Data Types Can only be nonclustered index (2012) No computed columns No sparse columns No indexed views One Per Table
Columnstore in SQL 2014 Fewer Data Type Limitations Updateable Can be Clustered Index New Archival Compression Mode
Columnstore Updates (2014) Updates To Index Collected until they reach 1000 rows Tuple Movers Move into Index
Compression Demo
Questions
Contact – Twitter – Joedantoni.wordpress.com – Blog Bit.ly/SQLCompression – Presentation Resources