Design for Flexibility and Performance - ETL Patterns with SSIS and Beyond And without further ado, here is Daniel with Using SSIS to Prepare Data for.

Slides:



Advertisements
Similar presentations
Deep Dive into ETL Implementation with SQL Server Integration Services
Advertisements

SQL Server Integration Services 2008 &2012
ETL Design and Development Michael A. Fudge, Jr.
Performance Tuning SSIS. HR Departments are no fun. Don’t mention the stalking incident with Clay Aiken What happened in Vegas My prom date with a puppet.
4-1 INTERNET DATABASE CONNECTOR Colorado Technical University IT420 Tim Peterson.
2 Overview of SSIS performance Troubleshooting methods Performance tips.
Embarquez les services d'intégration SQL Server 2005 Romelard Fabrice D311.
Advanced ETL: Embedding Integration Services Ashvini Sharma Development Lead DAT411 Microsoft Corporation Sergei Ivanov Technical Lead DAT411 Microsoft.
Integration Services in SQL Server 2008 Allan Mitchell SQL Server MVP.
Pipes & Filters Architecture Pattern Source: Pattern-Oriented Software Architecture, Vol. 1, Buschmann, et al.
1 Database Systems Introduction to Microsoft Access Part 2.
1 Integration Services in SQL Server 2008 Allan Mitchell – SQLBits – Oct 2007.
7 Strategies for Extracting, Transforming, and Loading.
Building Data Integration Solutions with Integration Services Donald Farmer Group Program Manager Microsoft Corporation.
CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.
02 | Data Flow – Extract Data Richard Currey | Senior Technical Trainer–New Horizons United George Squillace | Senior Technical Trainer–New Horizons Great.
SSIS – Deep Dive Praveen Srivatsa Director, Asthrasoft Consulting Microsoft Regional Director | MVP.
Please note that the session topic has changed
Metasolv-OCDM Connector Metasolv OCDM. What is the MSS Adapter for Oracle Communications Data Model? The Oracle Communications Metasolv and Solution Adapter.
SQL SERVER AUDITING. Jean Joseph DBA/Consultant Contact Info: Blog:
Explore engage elevate Data Migration Without Tears Mike Feingold Empoint Ltd Tuesday 10th November 2015.
1 Record Linkage & Fuzzy Matching (More on "Blocking" for Performance Improvement) Joseph Vertido Melissa Data Fuzzy.
Copyright 2015 Varigence, Inc. Unit and Integration Testing in SSIS A New Approach Scott @varigence.
SLOWLY CHANGING DIMENSIONS Features vs. Performance Benjamin Sigursteinsson Miracle Iceland.
Carlos Bossy Quanta Intelligence SQL Server MCTS, MCITP BI CBIP, Data Mining Real-time Data Warehouse and Reporting Solutions.
3 Copyright © 2013, Oracle and/or its affiliates. All rights reserved. PeopleSoft General Ledger 9.2 New Features 9.2 Release New Features.
SQL Server Tasks and Components from CozyRoc
Platform and Data Migration With Little Downtime
ETL Design - Stage Philip Noakes May 9, 2015.
Data Warehouse ETL By Garrett EDmondson Thanks to our Gold Sponsors:
Launch Your Database Into Microsoft Azure
Data Virtualization Tutorial: Introduction to SQL Script
Antonio Abalos Castillo
Overview of MDM Site Hub
Informatica PowerCenter Performance Tuning Tips
Incrementally Moving to the Cloud Using Biml
SQL Server Integration Services
Presented by: Warren Sifre
Exploring Azure Event Grid
Dynamic Data Flows in SSIS without ProgramminG
Introducing New Team-based Data Integration with SSIS
Dynamic Data Flows in SSIS without ProgramminG
Swagatika Sarangi (Jazz), MDM Expert
Populating a Data Warehouse
Populating a Data Warehouse
Performance Tuning SSIS
About Me
BRK2279 Real-World Data Movement and Orchestration Patterns using Azure Data Factory Jason Horner, Attunix Cathrine Wilhelmsen, Inmeta -
Azure Event Grid with Custom Events
SQL Azure Database – No CDC, No Problem!
Launch Your Database Into Azure
Dynamic Data Flows in SSIS without ProgramminG
Populating a Data Warehouse
DYNAMIC DATA FLOWS IN SSIS WITHOUT PROGRAMMING
Designing SSIS Packages for Performance
Introduction to Dataflows in Power BI
SSIS Data Integration Data Warehouse Acceleration
SSIS Data Integration Data Warehouse Acceleration
DYNAMIC DATA FLOWS IN SSIS WITHOUT PROGRAMMING
2010 Microsoft BI Conference
DYNAMIC DATA FLOWS IN SSIS WITHOUT PROGRAMMING
Getting Data Where and When You Want it with SQL Server 2005
SSIS Data Integration Data Warehouse Acceleration
DYNAMIC DATA FLOWS IN SSIS WITHOUT PROGRAMMING
Just Enough SSIS Scripting to be Dangerous.
DYNAMIC DATA FLOWS IN SSIS WITHOUT PROGRAMMING
Supercharge your ETL Development with Advanced SSIS Components
Implementing ETL solution for Incremental Data Load in Microsoft SQL Server Ganesh Lohani SR. Data Analyst Lockheed Martin
Handling Data Errors in a Dataflow Task
Presentation transcript:

Design for Flexibility and Performance - ETL Patterns with SSIS and Beyond And without further ado, here is Daniel with Using SSIS to Prepare Data for Analytics. {speaker begins} Daniel Cai, Principal Developer, KingswaySoft

Daniel Cai Principal Developer @kingswaysoft Disclaimer At KingswaySoft, I worked in the product management role of identifying and defining solutions that help solve some of the most challenging integration scenarios using SSIS as the ETL platform https://twitter.com/danielcai Disclaimer https://www.linkedin.com/in/danielcai I use some premium components we have developed at KingswaySoft to show you the much simplified design patterns. Speaking is not my greatest strength. https://www.kingswaysoft.com/blog daniel.cai@kingswaysoft.com

Survey What are your typical day-to-day ETL development challenges? The sheer amount of data? Poor data quality include duplicates? Constant changing data schema? Referential integrity? What else?

Load data incrementally

Incremental Changes - Reading It is important to only read those records that have been changed in the source system before pushing them into the ETL pipeline for the best performance CDC Timestamp Fields Lookup + HashValues Diff Detector Use CDC component to capture changes in the source system Requires change data capture support by the database platform An enterprise feature, which means higher license cost Depends on database log reader Use a timestamp field in the source system to detect changes Save current timestamp value after each execution Generate HashValue for each input row and compare with the hash values in the destination table to detect changes. Use lookup component (or equivalent component) to detect new rows. Be mindful with hash collisions. A strong hash is more preferable Use the third-party Diff Detector component to compare two inputs and find out the differences between the two inputs. A total of 4 outputs available. Added Rows Deleted Rows Changed Rows Unchanged Rows Comes with much greater flexibility and support any data source types in SSIS Pipeline SELECT * FROM Customer WHERE LastModifiedDate > @LastRunTime AND LastModifiedDate <= @CurrentRunTime

Demos - Incremental Reading

Incremental Changes - Writing Upsert (Update/Insert) is typically the most efficient way of synchronizing databases. Simultaneously write new rows and updates existing rows. There are 3 main methods for performing Upsert in SSIS Lookup Transform Custom Script Upsert Write Action Requires developers to setup lookups to filter out new rows, updated rows, and unchanged rows then use OLEDB Command to update records (performance can be very slow) or leveraging staging tables using MERGE command for better performance Develop a custom script to write your own upsert strategy. Use staging database table Leverage SQL Merge command (if supported) through OLEDB connection Can be time consuming to implement. Available in third-party destination components. These components will handle the work for you by performing a check to see if the row exists already and performs the necessary write action all within 1 component.

Demos - Incremental Writing

Duplicate Detection

Duplicate Detection Fuzzy Lookup/Grouping Component 3rd Party Component - Duplicate Detector Use Fuzzy Lookup component to find duplicates Duplicates are determined based on key_in, key_out values Cumbersome configuration Fuzzy Lookup component is limited to OLEDB connection KingswaySoft Duplicate Detector component enables some advanced duplicate detection scenarios Additional matching support, such as FirstName, phone number, zip code, address, etc. Much better usability Support anything that comes through SSIS pipeline Duplicate

Demos – Duplicate Detection

Writing in Parallel

Writing in Parallel Usage Scenarios What to watch for? What to use? When writing is the bottleneck There are only 5 active buffers available per data flow Plan for proper buffer sizing If ever there is a change needed, you would have to go through each destination component after each branch What to use? Balanced Data Distributor

Demos – Writing in Parallel

Resource Links on CDC CDC https://www.mattmasson.com/2011/12/cdc-in-ssis-for-sql-server-2012-2/ https://www.youtube.com/watch?v=vrk5QVEPfJY