Squeeze Into Some Free Gains

Slides:



Advertisements
Similar presentations
Session 2Introduction to Database Technology Data Types and Table Creation.
Advertisements

Project Management Database and SQL Server Katmai New Features Qingsong Yao
Tables Lesson 6. Skills Matrix Tables Tables store data. Tables are relational –They store data organized as row and columns. –Data can be retrieved.
Recap of Feb 27: Disk-Block Access and Buffer Management Major concepts in Disk-Block Access covered: –Disk-arm Scheduling –Non-volatile write buffers.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 11: Storage and.
1.1 CAS CS 460/660 Introduction to Database Systems File Organization Slides from UC Berkeley.
André Kamman Friday November 20 SQLBITS IV. About Me  André Kamman  > 20 years in IT  Main focus on complex SQL Server environments (or a whole.
Implementing Database Snapshot & Database Mirroring in SQL Server 2005 Presented by Tarek Ghazali IT Technical Specialist Microsoft SQL Server MVP Microsoft.
Chapter 6 Physical Database Design. Introduction The purpose of physical database design is to translate the logical description of data into the technical.
SQL Server Indexes Indexes. Overview Indexes are used to help speed search results in a database. A careful use of indexes can greatly improve search.
Page 1 SQL Server Myths XV ENCONTRO DA COMUNIDADE SQLPORT Rui Ribeiro MCITP 2011/08/16.
IT:Network:Applications.  “Business runs on databases…” ◦ Understatement!  Requirements  Installation  Creating Databases  SIMPLE query ◦ Just enough.
Module 16: Performing Ongoing Database Maintenance
11 3 / 12 CHAPTER Databases MIS105 Lec15 Irfan Ahmed Ilyas.
Database structure and space Management. Database Structure An ORACLE database has both a physical and logical structure. By separating physical and logical.
Indexes / Session 2/ 1 of 36 Session 2 Module 3: Types of Indexes Module 4: Maintaining Indexes.
1/14/2005Yan Huang - CSCI5330 Database Implementation – Storage and File Structure Storage and File Structure II Some of the slides are from slides of.
Praveen Srivatsa Director| AstrhaSoft Consulting blogs.asthrasoft.com/praveens |
SQL/Lesson 7/Slide 1 of 32 Implementing Indexes Objectives In this lesson, you will learn to: * Create a clustered index * Create a nonclustered index.
SQLintersection Putting the "Squeeze" on Large Tables Improve Performance and Save Space with Data Compression Justin Randall Tuesday,
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Chapter 21: Controlling Data Storage Space 1 STAT 541 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
Unit-8 Introduction Of MySql. Types of table in PHP MySQL supports various of table types or storage engines to allow you to optimize your database. The.
CS4432: Database Systems II
Retele de senzori Curs 2 - 1st edition UNIVERSITATEA „ TRANSILVANIA ” DIN BRAŞOV FACULTATEA DE INGINERIE ELECTRICĂ ŞI ŞTIINŢA CALCULATOARELOR.
October 15-18, 2013 Charlotte, NC Accelerating Database Performance Using Compression Joseph D’Antoni, Solutions Architect Anexinet.
SQL Basics Review Reviewing what we’ve learned so far…….
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
SQL Database Management
Storage and File Organization
Chris Index Feng Shui Chris
Compression and Storage Optimization IDS xC4 Kevin Cherkauer
Module 11: File Structure
What Is a Latch? …and Why Do I Care? Eddie Wuerch, mcm
Introduction to SQL 2016 Temporal Tables
6/12/2018 2:19 PM BRK3245 DirectQuery in Analysis Services: best practices, performance, and use cases Marco Russo SQLBI © Microsoft Corporation. All rights.
Finding more space for your tight environment
Module 4: Creating and Tuning Indexes
Designing Database Solutions for SQL Server
Installation and database instance essentials
Introduction to SQL Server Management for the Non-DBA
The Ins and Outs of Partitioned Tables
Database Administration for the Non-DBA
SQL Server May Let You Do It, But it Doesn’t Mean You Should
Power BI Performance …Tips and Techniques.
Migrating a Disk-based Table to a Memory-optimized one in SQL Server
Marcos Freccia Stop everything! Top T-SQL tricks to a developer
Hitting the SQL Server “Go Faster” Button
Disk Storage, Basic File Structures, and Buffer Management
Database Implementation Issues
What Is a Latch? …and Why Do I Care? Eddie Wuerch, mcm
Physical Database Design
Module 11: Data Storage Structure
Squeeze Into Some Free Gains
Statistics: What are they and How do I use them
Steve Hood SimpleSQLServer.com
Introduction to Database Systems
Microsoft SQL Server 2014 for Oracle DBAs Module 7
Adding Lightness Better Performance through Compression
Chapter 13: Data Storage Structures
Four Rules For Columnstore Query Performance
Large Object Datatypes
IST 318 Database Administration
Processing Tabular Models
Chapter 13: Data Storage Structures
Chapter 13: Data Storage Structures
Lecture 20: Representing Data Elements
When to use indexing pro Features
Life Hacks: dbatools Edition
Presentation transcript:

Squeeze Into Some Free Gains Data Compression Squeeze Into Some Free Gains Slides and scripts available: https://github.com/jpomfret/demos/tree/master/DataCompression @jpomfret

About Me Jess Pomfret SQL Server DBA at Westfield Insurance, Ohio Open Source Contributor (dbatools, dbachecks, SqlServerDsc) Passionate about SQL Server, PowerShell & Proper Football jpomfret7@gmail.com @jpomfret @jpomfret

Agenda Advantages/Disadvantages of Data Compression What you can Compress Types of Compression What you should Compress How you can Compress Performance implications Wizardry @jpomfret

SQL Server Editions Standard Edition Enterprise Edition  SQL Server 2008 R2 SQL Server 2012 SQL Server 2014 SQL Server 2016 SP1+ SQL Server 2017 SQL Server 2019 Azure SQL Database Azure SQL Database Managed Instance Data compression has been around since 2008, why is it important now? SQL Server 2019 sp_estimate_data_compression_savings – now supports COLUMNSTORE and COLUMNSTORE_ARCHIVAL as options @jpomfret

Advantages of Data Compression Reduces database size More data per page Improved performance for I/O intensive workloads More rows in memory @jpomfret

Disadvantages of Data Compression CPU cost of compressing and decompressing Slightly slower single row inserts/updates Slower bulk updates/inserts CPU cost – Microsoft Whitepaper says 10% or less for row Data is decompressed when it is needed for filtering/joining/sorting/query response or when it is updated by application. And we’ll talk more about the performance costs later on @jpomfret

Disadvantages of Data Compression Enterprise Level feature* Can’t restore a compressed database to a Standard edition instance Failure occurs when SQL Server attempts to bring the database online, after however long it took to restore! Check for enterprise features sys.dm_db_persisted_sku_features Enterprise level feature if you’re on a version older than 2016 SP1, as a side note if you have a enterprise level system and you use data compression you can no longer restore/attach/etc. this database to a standard edition. Could be a problem if you say refresh test from production, but I’m sure no one does that  Also it is important to note that if you do try and restore this backup that includes enterprise level features to your standard edition server it won’t let you know before you wait 5 hours for that database to restore. It’ll work away, everything is looking good, then when the restore is basically complete and SQL Server tries to bring it online, failure. Database cannot be started in this edition… @jpomfret

What Can You Compress? Table Nonclustered Index Indexed Views Heap Clustered Index Nonclustered Index Indexed Views Individual Partitions If you add a NC index it defaults to NONE, no matter what the current compression setting on the table is. If you add a CL index on a heap it will by default inherit the compression setting of the heap If you have a table partitioned by month, where there is a lot of read and write activity on the current month it might not make sense to compress that. However it might make sense to compress the older months which are just used for read activity or reporting. Compression & Encryption: compress first, then encrypt (TDE). When you compress data is re-written as un-encrypted data and then encrypted. Application encryption can be used, but effect of compression will be lessened since encrypted data is more unique. TDE + backup compression – same deal, pages are backed up encrypted, therefore more unique data so less compression gains Row/Page Compression + Backup Compression - https://jesspomfret.com/double-compression/ none we get a 78% reduction in the size of the backup file. row compressed we get a 70% savings page compressed we still get a 60% reduction in size @jpomfret

What can’t be compressed? System tables Memory-optimized tables Tables with Sparse columns If maximum row size plus the compression overhead exceeds the maximum row size (8060 bytes) Table cannot be enabled for compression when the maximum row size plus the compression overhead exceeds the maximum row size (8060 bytes) Row-size check is performed when an object is initially compressed and checked when rows are inserted/updated Update to fixed-length must always succeed Disabling data compression must always succeed – even if the compressed row fits an update will fail if the row when uncompressed would not fit Unsupported with compression: Data compression is not supported for memory-optimized tables.  Because of their size, large-value data types are sometimes stored separately from the normal row data on special purpose pages. Data compression is not available for the data that is stored separately @jpomfret

Compression Types Row Page Columnstore Columnstore Archival Backup Compression Backup compression also came out in 2008 Columnstore 2012 @jpomfret

Row Compression Changes physical storage format of the data Variable length storage for fixed length datatypes smallint, int, bigint – only uses the bytes needed datetime, datetime2 – uses integer date representation with two 4- byte integers char – trailing spaces removed bit – metadata overhead brings this to 4 bits Smallint – 2 bytes Int – 4 bytes Bigint – 8 bytes @jpomfret

Row Compression EmployeeID FirstName LastName Address1 City 1 Alex Young 2 Sand Run Akron 2 Richard 77 High St. 3 Alexis 1 First Ave. Richfield bigint char(100) char(250) char(50) Now our database designer is not a fan of variable length columns, you can see that there is a lot of whitespace. Data isn’t stored exactly like this on pages, but when you have fixed length columns with data values that don’t take up all the allotted room you get a lot of white space on your pages. This is important since SQL Server reads pages into memory and you want to be as efficient as you can with your memory. And we have a local family business here so lots of repeating data in LastName and City. Bigint – was taking up 8 bytes. 1,2 and 3 all fit in one byte so that’s all they get. Char fields – all the trailing white space is gone, they only get the bytes they need. 1 Alex Young 2 Sand Run Akron 2 Richard 77 High St. 3 Alexis 1 First Ave. Richfield @jpomfret

Page Compression Row Compression Prefix Compression Dictionary Compression Prefix and dictionary are type agnostic – can replace duplicate values from any data type @jpomfret

Page Compression Step 1 – Row Compression EmployeeID FirstName LastName Address1 City 1 Alex Young 2 Sand Run Akron 2 Richard 77 High St. 3 Alexis 1 First Ave. Richfield bigint char(100) char(250) char(50) Step 1 – Row Compression 1 Alex Young 2 Sand Run Akron 2 Richard 77 High St. 3 Alexis 1 First Ave. Richfield @jpomfret

Page Compression Step 2 – Prefix Compression EmployeeID FirstName LastName Address1 City 1 Alex Young 2 Sand Run Akron 2 Richard 77 High St. 3 Alexis 1 First Ave. Richfield bigint char(100) char(250) char(50) Step 2 – Prefix Compression Prefix compression looks for a common value at the beginning of columns that could reduce storage That value is stored on the page, in an anchor-record directly after the header in the Compression Information (CI) Repeated values are then replaced with a pointer to the CI Alexis Young Akron 1 [4] [empty] 2 Sand Run 2 [0Richard] 77 High St. 3 1 First Ave. [0Richfield] @jpomfret

Page Compression Step 3 – Dictionary Compression EmployeeID FirstName LastName Address1 City 1 Alex Young 2 Sand Run Akron 2 Richard 77 High St. 3 Alexis 1 First Ave. Richfield bigint char(100) char(250) char(50) Step 3 – Dictionary Compression The final step of PAGE compression is dictionary compression This looks for repeated values anywhere on the page and stores them in the Compression Information (CI) Dictionary compression is not restricted to one data type Alexis Young Akron Richfield 1 [4] [empty] 2 Sand Run 2 (0[4ard]) 77 High St. 3 1 First Ave. (0) @jpomfret

Internals – Let me show you! -- Find pages in Employee table DBCC IND ('CompressTest', 'employee', 1); -- TF to output in messages instead of event log DBCC TRACEON (3604); GO DBCC PAGE('CompressTest',1,416,3) -- pminlen - size of fixed length records-- 512 -- m_slotCnt - records on the page-- 3 -- m_freeCnt - bytes of free space on the page-- 6545 @jpomfret

What Should You Compress? Numeric or fixed-length columns where most values don’t use all allocated bytes Nullable columns with a lot of NULL values Repeating data values Based on workload Low Percent of Updates = good for PAGE compression High Percent of Scans = good for PAGE compression So we’ve talked about what you can compress, and the different types of compression you can use, but what should you compress? Tiger team – SQL Server Engineering https://blogs.msdn.microsoft.com/blogdoezequiel/2011/01/03/the-sql-swiss-army-knife-6-evaluating-compression-gains/ Compression gains script – can be downloaded from Tiger team github repo - the longer your server has been up the more reliable this script as it uses index usage DMVs to calculate percent of updates and percent of scans - percent update – percentage of update operations relative to total operations - percent scan – percentage of scan operations on that object relative total operations @jpomfret

Demo What should I compress How to apply Compression sp_estimate_data_compression_savings TigerTeam – Evaluate Compression Gains How to apply Compression T-SQL SQL Server Management Studio https://github.com/Microsoft/tigertoolbox/tree/master/Evaluate-Compression-Gains sp_estimate_data_compression_savings – doesn’t work in azure, after this presentation in Cleveland, Erin tweeted about this and Kalen Delaney built one for Azure - https://www.sqlserverinternals.com/blog/2018/6/6/creating-my-own-spestimatedatacompressionsavings @jpomfret

T-SQL – Apply Compression --Apply compression to the Clustered Index ALTER TABLE [Sales].[SalesOrderDetail] REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = ROW) --Apply compression to the NC Index ALTER INDEX [IX_SalesOrderDetail_ProductID] ON [Sales].[SalesOrderDetail] REBUILD PARTITION = ALL @jpomfret

Compression performance impacts CPU increases to compress/decompress Disk I/O decreases since objects require less pages Demo @jpomfret

Compress a whole Database? Build T-SQL Scripts from sys.objects etc. Create a cursor that loops through applying compression What about compressing multiple databases?.. Or even multiple databases across multiple servers?.. So we have covered most of our agenda: Advantages/Disadvantages of Data Compression What you can Compress Types of Compression What you should Compress How you can Compress Performance implications Time for some Wizardry @jpomfret

dbatools Open Source PowerShell Module Hosted on GitHub “sort of like a command-line SQL Server Management Studio” Over 500 Functions http://dbatools.io http://dbatools.io https://github.com/sqlcollaborative/dbatools https://jesspomfret.com/t-sql-tuesday-101/ @jpomfret

dbatools & Data Compression >Get-Command -Module dbatools -Name *Compression* Get-DbaDbCompression Set-DbaDbCompression Test-DbaDbCompression Two cmdlets you need to know to get started Get-Command Get-Help Cmdlets Use Verb-Noun Names Approved list of Verbs - https://msdn.microsoft.com/en-us/library/ms714428(v=vs.85).aspx Get-Verb @jpomfret

> Get-Help Get-DbaDbCompression NAME Get-DbaDbCompression SYNOPSIS Gets tables and indexes size and current compression settings. SYNTAX Get-DbaDbCompression [-SqlInstance] <DbaInstanceParameter[]> … DESCRIPTION This function gets the current size and compression for all objects in the specified database(s), if no database is specified it will return all objects in all user databases. REMARKS To see the examples, type: "get-help Get-DbaDbCompression -examples". For more information, type: "get-help Get-DbaDbCompression -detailed". For technical information, type: "get-help Get-DbaDbCompression -full". Also I have a blog post on this function http://jesspomfret.com/t-sql-tuesday-104-code-you-cant-live-without/ @jpomfret

Compression with dbatools Demo $results = Test-DbaDbCompression ` -SqlInstance localhost\sql2016 ` -Database AdventureWorks2016 Set-DbaDbCompression ` -Database AdventureWorks2016 ` -InputObject $results @jpomfret

Demo docker pull jpomfret7/datacompression:demo docker run -e ACCEPT_EULA=Y -e SA_PASSWORD=$SaPwd -p 1433:1433 -d jpomfret7/datacompression:demo https://jesspomfret.com/data-compression-containers/ @jpomfret

Tell Me More Data Compression BOL Data Compression Whitepaper https://docs.microsoft.com/en-us/sql/relational-databases/data-compression/data-compression Data Compression Whitepaper https://docs.microsoft.com/en-us/previous-versions/sql/sql-server-2008/dd894051(v=sql.100) TigerTeam – Evaluate Compression Gains https://github.com/Microsoft/tigertoolbox/tree/master/Evaluate-Compression-Gains dbatools http://dbatools.io https://github.com/sqlcollaborative/dbatools https://www.sqlskills.com/blogs/jonathan/enlarging-the-adventureworks-sample-databases/ @jpomfret

Any Questions? Jess Pomfret jpomfret7@gmail.com JessPomfret.com