Optimizing SQL Server and Databases for large Fact Tables

Optimizing SQL Server and Databases for large Fact Tables
=tg= Thomas Grohser, NTT Data SQL Server MVP SQL Server Performance Engineering SQL Saturday #513 July 30th 2016, Albany, NY

select * from =tg= where topic =
Remark SQL 4.21 First SQL Server ever used (1994) SQL 6.0 First Log Shipping with failover SQL 6.5 First SQL Server Cluster (NT4.0 + Wolfpack) SQL 7.0 2+ billion rows / month in a single Table SQL 2000 938 days with 100% availability SQL 2000 IA64 First SQL Server on Itanium IA64 SQL 2005 IA64 First OLTP long distance database mirroring SQL 2008 IA64 First Replication into mirrored databases SQL 2008R2 IA64 SQL 2008R2 x64 First 256 CPUs & > STMT/sec First Scale out > STMT/sec First time 1.2+ trillion rows in a table SQL 2012 > Transactions per second > 1.3 Trillion Rows in a table SQL 2014 > Transactions per second Fully automated deploy and management SQL 2016 AlwaysOn Automatic HA and DR, crossed the PB in storage SQL vNext Can’t wait to push the limits even further =tg= Thomas Grohser, NTT DATA Senior Director Technical Solutions Architecture / Focus on SQL Server Security, Performance Engineering, Infrastructure and Architecture New Papers coming 2016 Close Relationship with SQLCAT (SQL Server Customer Advisory Team) SCAN (SQL Server Customer Advisory Network) TAP (Technology Adoption Program) Product Teams in Redmond Active PASS member and PASS Summit Speaker 22 Years with SQL Server

NTT DATA Overview Why NTT DATA for MS Services:
20,000 professionals – Optimizing balanced global delivery $1.6B – Annual revenues with history of above-market growth Long-term relationships – >1,000 clients; mid-market to large enterprise Delivery excellence – Enabled by process maturity, tools and accelerators Flexible engagement – Spans consulting, staffing, managed services, outsourcing, and cloud Industry expertise – Driving depth in select industry verticals Why NTT DATA for MS Services: NTT DATA is a Microsoft Gold Certified Partner. We cover the entire MS Stack, from applications to infrastructure to the cloud Proven track record with MS solutions delivered in the past 20 years

Agenda Defining the issue/problem Looking at the tools
Using the right tools Q&A ATTENTION: Important Information may be displayed on any slide at any time! ! Without Warning !

Definition of a large fact table
Moving individual target over time 2001 for me big was > 1 billion rows > 90 GB 2011 for me big was > 1.3 trillion rows > 250 TB 2016 ??? 10 PB ???

Size matters not! Having the right tools in place and knowing how to use them to handle the data is the solution.

The Problem Trying to run 30 reports on a big fact table that each need to scan the whole table… The data is ready at 5am in the morning… Reports need to be ready by 9am… The baseline Each report takes about 2 hours to finish…

Good news for people with SA
Tools Hardware (Server, Storage) SQL Server (Standard, (BI), Enterprise) Clever Configuration Clever Query Scheduling Good news for people with SA

Hardware “The grade of steel”

CPU is not the limit On a modern CPU each core can process about 500 MB/s How many cores do we have in commodity server? 4-22 cores (that’s 4 more since April 2016) 1-8 sockets That’s 4 to 176 cores or ~2 – ~88 GB per second or ~7 to ~300 TB per hour CPU Capacity is a rarely a bottle neck

Understanding how SQL scans data
SQL Servers reads the data page by page SQL Server may perform read-ahead Dynamically adjusts read-ahead size by table Standard Edition: Up to 128 pages Enterprise Edition: Up to 512 pages That’s up to 1 MB (Std) or 4 MB (Ent) Read ahead as much as possible… Why? Reading 4 MB takes about as long as reading 8 KB So lets help SQL doing it.

Read Ahead happens if … The next data needed is in contiguous pages on the disk. Problem with 2 or more tables that grow at the same time.

Multiple Data Files 1-3-5-7-9-… 2-4-6-8-… 1-2-4-5-7-8-… 3-6-9-… 1-3 5
8-9 … 2-4 6-7 …

Multiple File Groups FG1 FG2 … …

SQL Server Startup Options
-E can be your friend if you have large tables -E allocates 64 extents at a time That is 4 MB at a time for each table instead of 64 KB The cost of it: every table is at least 4MB (including all the ones in tempdb!

Multiple Data Files Revisited

IO and Storage Path

Read speed factor – Direct Attached
1X RAID 5 0.25-4X 1X RAID 1 1-2X 2X RAID 5 0.5-2X

Read speed factor - SAN On SAN the paths to the array are most likely the limiting factor Ensure there are enough paths to the array Try disable read cache if possible (most of the time makes it faster) 1X 1X 1X 2X

Understand the path to the drives
Cache Fiber Channel Ports Controllers/Processors Switch HBA RAID Cntr. SAN DAS SSD SSD NVRAM

IO Bottle necks Rotating Disks (10-160 MB/sec) ~ 0.1 GB/s
Disk Interface / SSD (3-12 Gb/sec) ~ GB/s RAID Controller (1-8 GB/sec) ~ GB/s Ethernet (1 or 10 Gb/sec) ~ GB/s Fiber Channel (2-16 Gb/sec) ~ GB/s Host bus Adapter (2-32 Gb/sec) ~ GB/s PCIe Express Bus ( GB/sec) ~ GB/s System (4-16 PCIe Busses) ~ GB/s

Schema and Indexes

Choose the clustered index key wisely
If you have a lot of queries that range scan WHERE value BETWEEN x AND y Multiple dates in a table (e.g. Order, Ship, Delivery date, …) Which to choose? None Put index on unique ID and have helper table Date DateType MinID MaxID

Table Partitioning Great tool to make maintaining the database easier but does not give us much in performance. Could actually slow us down. Might be needed to spread data across multiple File Groups

Row and Page Compression
ROW compression Almost now overhead Can save several unused bytes in each row Remember: 1 byte less on 1 billion rows is 1 GB Page Compression Some overhead Can save a lot on repeating patterns (same values within a page) New data is not compressed ! Never compress lookup data

Mary Go Round Piggy Back Scan
Query 1 Query 2 Enterprise Edition Only Automatically invoked With planning much better results

Column Store Index With SQL2016 finally fully usable
(updateable without workarounds, can be the clustered index) ~40% faster then before Awesome compression ratios Even better results if a lot of queries only require a few columns of the fact table

THANK YOU! and may the force be with you…
Questions?

Optimizing SQL Server and Databases for large Fact Tables

Similar presentations

Presentation on theme: "Optimizing SQL Server and Databases for large Fact Tables"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Optimizing SQL Server and Databases for large Fact Tables

Similar presentations

Presentation on theme: "Optimizing SQL Server and Databases for large Fact Tables"— Presentation transcript:

Similar presentations

About project

Feedback