Managing Very Large Databases with SQL Server

Managing Very Large Databases with SQL Server
Tips and best practices for DBAs

Please Visit Sponsors and Enter Raffles

Who Am I? Manager, DBA @ ChannelAdvisor Microsoft Certified Master
Seven years working with SQL Server Live in Raleigh, North Carolina 3 | 9/17/2018 Managing Very Large Databases with SQL Server

Agenda Very Large Databases Maintenance Partitioning SQL 2016
Backups & Restores Indexes Integrity checks Statistics Partitioning SQL 2016 High Availability & Disaster Recovery 4 | 9/17/2018 Managing Very Large Databases with SQL Server

What constitutes a VLDB?
So what are they? Hundreds of MBs in the ‘80s … Hundreds of GBs in ‘90s … Hundreds of TBs in ‘00s ? Hundreds of PBs today ? When default practices may change due to database size 5 | 9/17/2018 Managing Very Large Databases with SQL Server

VLDBs in SQL Server > 1TB or more than one billion rows
SQL is a good choice Need to think more about maintenance Definitely need think more about recovery Where EE delivers value Size hardware appropriately Know VLDB management techniques before your database becomes a VLDB 6 | 9/17/2018 Managing Very Large Databases with SQL Server

SQL VLDBs @ ChannelAdvisor
88TB production data 111 production SQL instances 7.2TB largest instance 4.2TB largest database 7 | 9/17/2018 Managing Very Large Databases with SQL Server

Before you have a VLDB Consider “scale out” options
Compress your data (not Sharepoint) sp_estimate_data_compression_savings Consider archival / retention Sliding window portioning Stretch database May be more cost effective to have fewer EE instances Scale is hard too! For archival incentivize developers rather than punish them Archival must be tough or StretchDB would not exist Can data reside in multiple instances / databases / tables 8 | 9/17/2018 Managing Very Large Databases with SQL Server

Backups Possible initial pain point Implication of long backup windows
Restore strategy Difficulty is not an excuse “The database is very big so we stopped taking backups.” – sqlservercentral.com Track backup times Test restores Long backup times impact RPO/RTO Especially if backup fails Rebuild from source may be an option but make sure you can recreate schema (DACPAC may work) Use backup compression 9 | 9/17/2018 Managing Very Large Databases with SQL Server

Backup Strategy Differential backups Filegroup backups
Read-only filegroups Be cautious of “two-step” processes Try avoid backup local then copy to share May not be possible to avoid in public cloud Stripe backups across multiple files Use backup compression Don’t recommend “file” backups, due to restoration difficulties Backup directly to a CIFS share Database compression does not negatively impact backup compression 10 | 9/17/2018 Managing Very Large Databases with SQL Server

Optimizing backups Best practices do not negate testing
Test backing up to NUL Our Typical Parameters MAXTRANSFERSIZE = 4MB BUFFERCOUNT (memory tolerance) BLOCKSIZE = 64k (discuss with storage team) Writer threads = number of backup files Reader threads = number of mount points / drives Backup to NUL determines theoretical throughput Internally in CA we use a buffercount of 200 Memory tolerance (xfer * max) = 800MB (outside bufferpool) Single reader thread per drive/mount point Single writer thread per backup file Ideal is 1:1 relationship with CPU cores (but consider any network bottleneck) 11 | 9/17/2018 Managing Very Large Databases with SQL Server

Restore Considerations
What do we typically need to restore? Alternate restore options Object-level recovery SQL database snapshots SAN snapshots Online piecemeal restore Far more likely to need to restore accidentally deleted or incorrectly updated data than restore an entire database to a point-in-time, overwriting existing data In many enterprises it is better to just fix the broken thing than start afresh Not uncommon to have a log shipped “delayed” copy of the data 12 | 9/17/2018 Managing Very Large Databases with SQL Server

Object-Level Recovery
3rd party tools only Recover objects directly from backups Run SQL queries/scripts against the backup EXEC = 'C:\MSSQL\Backup\MyDB_Backup.BAK' = 'SELECT * FROM dbo.Customers WHERE City=''London''' = 'dbo.CustomersInLondon' = 'MyDB' Useful for recovering from “oops” deletion events Oops deletion events should figure in your BC/DR plans 13 | 9/17/2018 Managing Very Large Databases with SQL Server

SQL Database Snapshots
Static copy of source database Copy-on-write Useful validation option Leverages NTFS sparse files By default CHECKDB uses snapshots You “revert” to a snapshot Not compatible with IMOLTP (2014) Transactionally consistent read-only static view of source database at point of creation Before a page is modified the original is copied to the snapshot (copy-on-write) Could be taken before deployments Instant creation (but must reside on same instance) Minimal space required (initially) CheckDB uses snapshots but same process can be undertaken manually. For highly fragmented NTFS volumes, CHECKDB may fail with error 665 but manual snapshot 14 | 9/17/2018 Managing Very Large Databases with SQL Server

SQL Database Snapshots
Static copy of source database Copy-on-write Useful validation option Leverages NTFS sparse files By default CHECKDB uses snapshots You can “revert” to a snapshot Does not replace backups Transactionally consistent read-only static view of source database at point of creation Before a page is modified the original is copied to the snapshot (copy-on-write) Could be taken before deployments Instant creation (but must reside on same instance) Minimal space required (initially) CheckDB uses snapshots but same process can be undertaken manually. For highly fragmented NTFS volumes, CHECKDB may fail with error 665 but manual snapshot Can be useful for QA/Dev testing 15 | 9/17/2018 Managing Very Large Databases with SQL Server

SAN snapshots “Application-aware” snapshots supported
Entry in MSDB backup history tables Seconds to backup and restore File layout / recovery model restrictions Recovery plans need SAN admin May be useful for “refreshing” QA/Dev Not SQL (do you have friends on the storage team?) SQL VSS Writer vs Windows VSS Writer Impact virtually unnoticeable even under heavy load (Zerto) – freeze IO warning LUNs need to be snapped at same time Equivalent to a full backup usually Vendors vary, only Snapmanager (NetApp) that I am aware of integrates log backups. Delphix slightly different. 16 | 9/17/2018 Managing Very Large Databases with SQL Server

Online Piecemeal Restore
Enterprise feature (online) Restore some filegroups while others remain online Must restore PRIMARY & In-memory FG Integrates with partitioning Filegroup with latest data can be brought online before anything else 2005+, assume everything in my talk is, I will identify newer features post-2005, I like to pretend SQL 2000 doesn’t exist because I never worked with it Allows the ability to bring databases online while some filegroups remain offline (recovery pending) For a VLDB with a lot of archive data Typical sequence Restore PRIMARY and any immediately required filegroups Take a tail log backup (do not use NORECOVERY option) Restore log sequence including tail log backup Restore archive filegroups and log sequence For read only filegroups no log sequence required 17 | 9/17/2018 Managing Very Large Databases with SQL Server

Restore strategy Design for restores ABT: Always Be Testing
Keep the PRIMARY filegroup small Filegroup design aligns to recovery needs Related objects should be kept in same filegroups ABT: Always Be Testing Fully automated if possible Script generation at a minimum Powershell Cmdlets available No user objects in PRIMARY filegroup Primary filegroup may still get big in 2016 due to Query Store Can be enforced using PBM or DDL triggers 18 | 9/17/2018 Managing Very Large Databases with SQL Server

Page Level Restores They rock! Page corruption
Can you wait? Do you have disk space? Do you have Enterprise Edition? Automatic page repair (Mirroring/AG) Recover from corruption without downtime Automatic page repair events should be monitored 19 | 9/17/2018 Managing Very Large Databases with SQL Server

Demo Time 20 | 9/17/2018 Managing Very Large Databases with SQL Server

Corruption Detection Integrity checks are expensive
Better detection outside of DBCC Checksum on pages Page scan for ECC checksum violations Checksum on tempdb (2008+) Combination of checksum and backup with checksum can detect errors outside page header Consistency checks still needed Running integrity checks on VLDBs is difficult where system is available 24x7 2005+ data purity, indexed view integrity in standard check SQL 2016 enhancements 9/17/2018 | Footer Goes Here

Optimizing CHECKDB May need PHYSICAL_ONLY option
Align with partitioning DBCC CHECKFILEGROUP Individual checks DBCC CHECKALLOC; DBCC CHECKCATALOG; DBCC CHECKTABLE Due to internal snapshot can be intensive for write-heavy workloads Use resource governor to limit memory to 1GB on systems with >256GB RAM for improved performance PHYSICAL_ONLY does not examine logical corruption SQL 2016 may be “fast enough” even with logical detection

Offload CHECKDB Restore to new location and run
DBCC CHECKDB (MegaBigVLDB) WITH NO_INFOMSGS, EXTENDED_LOGICAL_CHECKS, DATA_PURITY Advantage of checking backups and integrity Can be combined with SAN snapshots May double the amount of additional storage required False positive of catching corruption introduced by new storage Running CHECKDB against a AG secondary replica is not the same (unless used for backups) - Running CHECKDB against secondary is fine if you take your backups on the secondary

Index maintenance Fragmentation still matters regardless of storage
Index rebuilds will be expensive Online index rebuilds by partition SQL EE only Wait priority options in SQL 2014 Filtered indexes can be useful Fragmentation still matters Read ahead, page splits causing low page density, bloated memory, wasted disk space Indexes can be reorganized online by partition ETL inserts may cause fragmentation Online index rebuilds will need a LOT of tempdb/transaction log/data file space for sorting

Statistics maintenance
Requires manual maintenance Index rebuild “free lunch” not always available (2012+) Auto update statistics asynchronous SQL 2014 has incremental statistics SQL has filtered statistics SQL 2016 makes TF2871 default Monitor row modifications to update With VLDBs you may need manual processes to maintain statistics (use Ola Hallengren’s scripts) Dynamic sampling for online rebuilds of partitioned indexes in 2012+ For large partitioned indexes data distributions difficult to represent accurately with 200 step histogram Incremental statistics – update the partition and merge with the existing histogram Only update statistics if rows have been modified (sys.dm_db_stats_properties DMF) – use Ola Hallengren’s scripts

Partitioning EE only Divide large table into multiple B-Trees
Sliding window archival/ETL Limitations Cannot use composite key Difficult to change schema Tricky to align all indexes Partition elimination can be hit-or-miss Do not think of it as a performance feature - Do not ignore possibility of partitioned views (or combine PV and PTs)

Trace flags to consider
2371** 2562 2459 610 -E (startup) 834 Default (for me 2014) 3226, 1118**, 1117**, 4199** Default (for me 2016) 3226 * YMMV, ** Not needed in 2016 TF 2371, dynamically adjust threshold for statistics maintenance TF 2562 – optimized for physical_only TF 2459 – optimized for one file per drive 610 Allow minimally logged inserts into non-empty B-Tree -E (startup) Alters proportional fill algorithm to 64 extents before switching 1117 Expands all files in a filegroup when expansion is triggered 1118 Removes mixed extents so all extents become uniform extents 834 Enables large pages

HA / DR VLDBs may have more stringent requirements for HA/DR
Problems may take longer to resolve Understand your options Failover clustering Log shipping Mirroring Availability Groups AlwaysON FCI Test recovery times Log shipping Offsite / DR Replication Initialize from backup Mirroring Automatic page repair AlwaysON Availability Groups (2012+) Offload reporting and backups Migration option for upgrades

SQL 2016 “Enhanced” CHECKDB Truncate by partition
Ignore persisted computed columns, filtered indexes, UDT columns Truncate by partition GZIP Compression (COMPRESS) Batch mode Columnstore Stretch database

Coming up Next Awesome Raffle Prizes Free Beer and BBQ from 17:00

Managing Very Large Databases with SQL Server

Similar presentations

Presentation on theme: "Managing Very Large Databases with SQL Server"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Managing Very Large Databases with SQL Server

Similar presentations

Presentation on theme: "Managing Very Large Databases with SQL Server"— Presentation transcript:

Similar presentations

About project

Feedback