Antonio Abalos Castillo

Antonio Abalos Castillo
How to load your data faster and safer using Change Tracking in SQL Server

Thank you to our sponsors!

Agenda Why faster data loads? What is Change Tracking? Design overview
Demo/implementation Extra hints

Why faster data loads? Corporations load and replicate data in a variety of ways They become unreliable or miss data over time They use unsupported ways to identify increment of data They are difficult to maintain Not optimal when identifying the updated data Need extra programming effort Do not follow standards

Why faster data loads? Benefits of using this approach
No programming overhead at the source Avoid using timestamps, row GUIDs or any other programming artifact Change Tracking is transparent to applications Maintenance cost is 0 Very low performance impact in the source database Multiple target systems can get data from the same source DB using this approach We get just the latest version, according to our last status. All different row status in the middle are skipped Running the delta more often will decrease the execution time MERGE is the fastest data loading method (SCD remains as a bad example) Minimally logged operations will help performance (maybe more than you think)

What is Change Tracking?
Change tracking is a lightweight solution that provides an efficient change tracking mechanism for applications Available since SQL Server 2008 Requires Standard edition of SQL Server or higher Lightweight: The incremental performance overhead that is associated with using change tracking on a table is similar to the overhead incurred when an index is created for a table and needs to be maintained

Each insert/update/delete in each table will be tracked by: The ID columns used in the table [optional] the columns that were updated Changes are accumulated and reported by SQL Server according to the last version we got

Enable Change Tracking Database level ALTER DATABASE AdventureWorks2012 SET CHANGE_TRACKING = ON (CHANGE_RETENTION = 2 DAYS, AUTO_CLEANUP = ON) For each audited table ALTER TABLE dbo.SalesOrderDetail ENABLE CHANGE_TRACKING WITH (TRACK_COLUMNS_UPDATED = ON)

Get changes from Change Tracking Get current version = CHANGE_TRACKING_CURRENT_VERSION(); Get minimum valid version = CHANGE_TRACKING_MIN_VALID_VERSION(OBJECT_ID('dbo.Sales'));

Get changes from Change Tracking Get changes for one table BIGINT = 82; SELECT CT.SalesID, CT.SYS_CHANGE_OPERATION, CT.SYS_CHANGE_COLUMNS FROM CHANGETABLE(CHANGES dbo.Sales, @last_ver) AS CT

Design overview Target Staging area MERGE delta over target data ETL
Minimally logged operations Automatic delta/full load detection Source Change Tracking enabled Isolation aware

Design overview Requirements: SQL Server source database
Change Tracking enabled Integration Services MERGE statements (SQL 2008+) Other data sources: Change Data Capture (Oracle)

Demo Demo scenario Server A Server B SQL Server Source database
Windows Azure VNET Server A SQL Server Source database Change Tracking Server B SQL Server Target database Logging SSIS Data flow

Extra hints – Best practices
Transaction isolation strategy Enable SNAPSHOT isolation in the source database Or create a source snapshot database Index maintenance jobs can break big transactions at the source Watch out for complex data flows that may need to break down into simpler ones The best is to have a one-to-one copy of the source table, but this is not always possible How do we deal with deleted rows? (joining tables) Do we need to track changes in columns?

Extra hints - Trick list
Use trace flag 610 (carefully) Use tab-lock in destination Use ORDER hint in destination Boost up DFT memory Boost up DFT number of rows Run parallel tasks The Data Loading Performance Guide

Extra hints - Other tricks
Databases in “simple” recovery model Change page torn detection to NONE Create a DATA file group and set it as DEFAULT Create as many files as CPU in each file group (depends on storage) Separate the log file from the data files in different disks Consider using heaps for fast-load processes Consider using partitioned tables for regular tables Increase parallelism

Extra hints - Security considerations
Catalog views sys.change_tracking_databases sys.change_tracking_tables Permissions SELECT permission on at least the primary key columns on the change-tracked table to the table that is being queried VIEW CHANGE TRACKING permission on the table for which changes are being obtained

Extra hints - Change Tracking Vs. Change Data Capture
Change data capture (CDC) Change tracking (CT) Tracked changes DML changes Yes Tracked information Historical data No Whether column was changed DML type Collects historical values, and therefore much more data than CT You have no idea on how many updates were made to a row, nor the values that were updated

Other references Brent Ozar’s guide to Change Tracking
Good guide for a data load using Change Tracking implementation

Antonio Abalos Castillo

Similar presentations

Presentation on theme: "Antonio Abalos Castillo"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Antonio Abalos Castillo

Similar presentations

Presentation on theme: "Antonio Abalos Castillo"— Presentation transcript:

Similar presentations

About project

Feedback