Presentation is loading. Please wait.

Presentation is loading. Please wait.

Big Data Working with Terabytes in SQL Server

Similar presentations


Presentation on theme: "Big Data Working with Terabytes in SQL Server"— Presentation transcript:

1 Big Data Working with Terabytes in SQL Server
Andrew Novick

2 Agenda What’s Big? Concerns Architecture Solutions
ETL/Load Performance Query Performance Backup/Restore Performance Architecture Solutions 2

3 Introduction Andrew Novick – Novick Software, Inc.
Business Application Consulting SQL Server .Net Books: Transact-SQL UDFs SQL 2000 XML Distilled

4 SQL Pass 2008 November – Seattle

5 What’s big?

6

7 What’s Big? 100’s of gigabytes and up to 10’s of terabytes
100,000,000 rows an up to 100’s of Billions of rows

8 Big Scenarios Data Warehouse
Very Large OLTP databases (usually with reporting functions)

9 Big Hardware Multi-core 8-64 RAM 16 GB to 256 GB
SAN’s or direct attach RAID 64 Bit SQL Server

10 Concerns

11 What me worry?

12 Concerns Load Speed (ETL) Query Speed Data Management Backup / Restore
DBCC CHECKDB, remove Fragmentation 12

13 What do we have to work with?
Architecture What do we have to work with?

14 SQL Server Storage Architecture
Table1 Table2 FileGroupA FileGroupB FileA1 FileB1 FileB2 Logical Disk System – Windows Drives Drive C: Drive D: Drive E: Physical IO - subsystem Disk

15 Solutions

16 Solution to what? Load Speed (ETL) Query Speed Data Management
Backup / Restore DBCC CHECKDB, remove Fragmentation 16

17 Solutions Use Multiple FileGroups/Files
Spread Data to maximize resource use Sliding Window if there is a time dimension Partitioned Tables and/or Views ETL – Insert into empty unindexed tables Use READ_ONLY FileGroups to minimize maintenance needs.

18 I/O Performance Little has changed in 50 years
Watch out for bottlenecks in the I/O Path Memory reduces the need for I/O Disks can only do so many I/O operations per second The more disk heads you have the higher the I/O throughput.

19 At 3 PM on the 1st of the month: Where do you want your data to be?
19

20 Spread to as many disk resources as possible.
20

21 Sliding Window Always There Data Temporal Data 2008-01 Temporal Data
Temporal Data Temporal Data Temporal Data

22 Read_Only FileGroups Require only one Backup
ALTER DATABASE <database> MODIFY FILEGROUP <filegroup> SET READ_ONLY Require only one Backup Don’t require page or row locks Don’t require maintenance The ALTER requires exclusive access to the database before SQL 2008

23 Concern - Load Performance (ETL)
4 Hour maximum window for any load Load into large indexed tables is unacceptably long. Example: 2 million row insert into 400 million row table with 10 indexes took 12 hours. 23

24 Concern – Query Performance
Users have little patience Data warehouse Queries Frequent small to medium to support UI Less frequent large queries on fact tables may access 10’s of GB

25 Fact Table Queries Concentrated time period
Most recent Year ago May go against full table to get year-against-year 25

26 Dimension Table Queries
Smaller than fact table queries Sometimes involve millions of rows Frequent – support the UI

27 Partitioning 27

28 Partitioned Views Available in SQL Server Standard
Created like any view Check constraints tell SQL Server which data is in which table CREATE VIEW Fact AS SELECT * FROM Fact_ UNION ALL SELECT * FROM Fact_ ALTER TABLE Fact_ ADD CONSTRAINT CK_FACT_ _Date CHECK (FactDate >= ‘ ’ and FactDate < ‘ ’ 28

29 Partitioned View - 2 Looks to a query like any table or view
Can take advantage of parallel execution. Limited to 256 tables Can cross servers (Performance Warning) SELECT FactDate, … FROM Fact WHERE CustID= AND FactDate = ‘ ’

30 Partitioned View SQL Server Storage View Fact Table1 Table2
FileGroupA FileGroupB FGF1 FGF1 FGF2 FGF2 FGF3 FGF3 FGF4 FGF4 FileA1 FileB1 FileB2 F1 F1 F2 F2 F3 F3 F4 F4 Logical Disk System – Windows Drives Drive C: Drive D: Drive E: Physical IO - subsystem Disk

31 Partition Elimination
The query compiler can eliminate partitions from consideration in the plan Partition elimination happens at query compile time. Values matching the partitioning column must be constants to allow partition elimination.

32 Demo 1 – Partitioned Views

33 Partitioned Tables SQL Server Enterprise SQL Server 2005 and Above
Require a non-null partitioning column Check constraints tell SQL Server what data is in each parturition All tables are partitioned! 33

34 Partitioned Tables 2 Partition Function Partition Scheme
Defines how to split data Partition Scheme Defines where to store each range of data CREATE Partitioned View Fact_PF(smalldatetime) RANGE RIGHT FOR VALUES (‘ ’, ‘ ’) CREATE PARTITION SCHEME Fact_PF AS PARTITION Fact_pf TO (PRIMARY, FG_ , FG_ )

35 Partitioned Table SQL Server Storage Table Fact Table1 Table2
Fact.$Partition=1 Fact.$Partitoin=3 Fact.$Partition=4 Fact.$Partition=2 FileGroupA FileGroupB FGF1 FGF2 FGF3 FGF4 FileA1 FileB1 FileB2 F1 F2 F3 F4 Logical Disk System – Windows Drives Drive C: Drive D: Drive E: Physical IO - subsystem Disk

36 Demo 2 – Partitioned Tables

37 Partitioning Goals Adequate Import Speed Maximize Query Performance
Make use of all available resources Data Management Migrate data to cheaper resources Delete old data easily 37

38 Achieving Load Speed Insert into empty tables
Index and add foreign keys after the insert Add the Slices to Partitioned Views Partitioned Tables 38

39 Achieving Query Speed Eliminate access to partitions during query compile All disk resources should be used Parallel access All available memory should be used All available CPUs should be used Parallel query 39

40 Solution Partition at a sufficiently high grain
Spread dimension data to all useable disks Separate Data and Index FileGroups Multiple files per FileGroup Spread Fact data by partition key to all useable disks Rotate file locations to maximize dispersion 40

41 Concern – Data Management (Backup)
Let’s say you have a 10 TB database. Now back that up.

42 Backup Calculation 10 TB = 10000 GB Typical Backup speed
Low end 1 GB per minute High end 10 GB per minute At 10 GB/Minute Who’s got minutes?

43 Achieving Backup Performance
Backup less! Maintain data in a READ_ONLY state Compress Backups

44 Partial Backup Partial Base Partial Differential
Backs up read_write filegroups Partial Differential Differential backup of read_write filegroups BACKUP DATABASE <db name> READ_WRITE_FILEGROUPS ….. BACKUP DATABASE <db name> READ_WRITE_FILEGROUPS WITH DIFFERENTIAL ….

45 Maintenance Operations
Maintain only READ_WRITE data DBCC CHECKFILEGROUP ALTER INDEX REBUILD PARTITION = REORGANIZE PARTITION = Avoid SHRINK

46 SQL Server 2008 – What’s New Row, page, and backup compression
Filtered Indexes Optimization for star joins MERGE T-SQL DML Resource Governor Fewer operations require exclusive access to the database

47 New England Visual Basic Pro
Focused on VB.Net development MS Waltham – MPR C 1st Thursday - 6:15 to 8:30 PM Sept 4 – Jim O’Neil – ASP.Net Dynamic Data Sept 25 – Chris Hammond – DotNetNuke Oct 2 – Kathleen Dollard – XML Litterals in VB 9 Nov 6 – Joe Stagner – Stupid Hacker Tricks and How 2 Defend Feb 5 ’09 – Joe Hill – Novell – Mono/VB/etc….

48 Thanks for Coming Andrew Novick anovick@NovickSoftware.com


Download ppt "Big Data Working with Terabytes in SQL Server"

Similar presentations


Ads by Google