SSIS Exploring Scalability, Performance and Deployment Vinod Kumar & Srinivas Sampath MVP – SQL Server.

Slides:



Advertisements
Similar presentations
SSIS Dataflow Performance Tuning 1 st October 2010 Jamie Thomson.
Advertisements

Module 8 Importing and Exporting Data. Module Overview Transferring Data To/From SQL Server Importing & Exporting Table Data Inserting Data in Bulk.
Introduction to ETL Using Microsoft Tools By Dr. Gabriel.
Deep Dive into ETL Implementation with SQL Server Integration Services
Moving Data Lesson 23. Skills Matrix Moving Data When populating tables by inserting data, you will discover that data can come from various sources.
SSIS Field Notes Darren Green Konesans Ltd. SSIS Field Notes After years of careful observation and recording of the Species SSIS, Genus ETL, in both.
Chapter 18 - Data sources and datasets 1 Outline How to create a data source How to use a data source How to use Query Builder to build a simple query.
DEV392: Extending SharePoint Products And Technologies Through Web Parts And ASP.NET Clint Covington, Program Manager Data And Developer Services - Office.
Matt Masson| Senior Program Manager
Wouter Smit About the Speaker Wouter has been working in the data warehousing field for more than 10 years MCITP Professional Database Administrator.
WDK Driver Test Manager. Outline HCT and the history of driver testing Problems to solve Goals of the WDK Driver Test Manager (DTM) Automated Deployment.
05 | Configuration and Deployment Richard Currey | Senior Technical Trainer–New Horizons United George Squillace | Senior Technical Trainer–New Horizons.
1 Chapter Overview Transferring and Transforming Data Introducing Microsoft Data Transformation Services (DTS) Transferring and Transforming Data with.
Week 5 – Chap. 5 Data Transfer DBAs often must transfer data to and from text files, Excel spreadsheets, Access, Oracle or other SQL Server databases This.
Module 11: Data Transport. Overview Tools and functionality in Oracle and their equivalents in SQL Server for: Data transport out of the database Data.
SQL Server 2005 Integration Services Mike Taulty Developer & Platform Group Microsoft Ltd
Enterprise Reporting with Reporting Services SQL Server 2005 Donald Farmer Group Program Manager Microsoft Corporation.
SSIS Over DTS Sagayaraj Putti (139460). 5 September What is DTS?  Data Transformation Services (DTS)  DTS is a set of objects and utilities that.
SQL Server Integration Services (SSIS) Presented by Tarek Ghazali IT Technical Specialist Microsoft SQL Server (MVP) Microsoft Certified Technology Specialist.
WorkPlace Pro Utilities.
What’s New in SSIS with SQL 2008 Bret Stateham Training Manager Vortex Learning Solutions blogs.netconnex.com.
2 Overview of SSIS performance Troubleshooting methods Performance tips.
 Nate Locklin ◦ Database Analyst, PPG Industries ◦  Steve Tirone ◦ Data Warehouse Analyst, Amerinet ◦
10-1 aslkjdhfalskhjfgalsdkfhalskdhjfglaskdhjflaskdhjfglaksjdhflakshflaksdhjfglaksjhflaksjhf.
Sofia, Bulgaria | 9-10 October SQL Server 2005 High Availability for developers Vladimir Tchalkov Crossroad Ltd. Vladimir Tchalkov Crossroad Ltd.
LiveCycle Data Services Introduction Part 2. Part 2? This is the second in our series on LiveCycle Data Services. If you missed our first presentation,
DEV325 Deploying Visual Studio.NET Applications Billy Hollis Author / Consultant.
DTS Conversion to SSIS Conversion Best Practices Mike Davis
IT 456 Seminar 5 Dr Jeffrey A Robinson. Overview of Course Week 1 – Introduction Week 2 – Installation of SQL and management Tools Week 3 - Creating and.
ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Cleansing Ola Ekdahl IT Mentors 9/12/08.
Embarquez les services d'intégration SQL Server 2005 Romelard Fabrice D311.
Advanced ETL: Embedding Integration Services Ashvini Sharma Development Lead DAT411 Microsoft Corporation Sergei Ivanov Technical Lead DAT411 Microsoft.
DAT 360: DTS in SQL Server 2000 Best Practices Euan Garden Group Manager, SQL Server Microsoft Corporation.
Virtual techdays INDIA │ august 2010 SQL Data Loading Techniques Praveen Srivatsa │ Director, AsthraSoft Consulting Microsoft Regional Director,
Integration Services in SQL Server 2008 Allan Mitchell SQL Server MVP.
Windows Forms in Visual Studio 2005 Mike Pelton Systems Engineer Microsoft Ltd
DEV333 Instrumenting Applications for Manageability with the Enterprise Instrumentation Framework David Keogh Program Manager Visual Studio Enterprise.
1 Chapter 20 – Data sources and datasets Outline How to create a data source How to use a data source How to use Query Builder to build a simple query.
DAT 332 SQL Server 2000 Data Transformation Services (DTS) Best Practices Euan Garden Product Unit Manager SQL Server Development Microsoft Corporation.
7 Strategies for Extracting, Transforming, and Loading.
Building Data Integration Solutions with Integration Services Donald Farmer Group Program Manager Microsoft Corporation.
Connect with life Vinod Kumar Technology Evangelist - Microsoft
SSIS – Deep Dive Praveen Srivatsa Director, Asthrasoft Consulting Microsoft Regional Director | MVP.
Connect with life Cheryl Johnson VSTS Solution Expert | Canarys Automations Pvt Ltd Performance Testing.
Metric Studio Cognos 8 BI. Objectives  In this module, we will examine:  Concepts and Overview  An Introduction to Metric Studio  Cognos 8 BI Integration.
Creating Simple and Parallel Data Loads With DTS.
Best Practices in Loading Large Datasets Asanka Padmakumara (BSc,MCTS) SQL Server Sri Lanka User Group Meeting Oct 2013.
Aggregator Stage : Definition : Aggregator classifies data rows from a single input link into groups and calculates totals or other aggregate functions.
Workflow in Microsoft Office SharePoint Server Jessica Gruber Consultant Microsoft Corporation.
Scripting Just Enough SSIS to be Dangerous. 6/13/2015 Visit the Sponsor tables to enter their end of day raffles. Turn in your completed Event Evaluation.
Explore engage elevate Data Migration Without Tears Mike Feingold Empoint Ltd Tuesday 10th November 2015.
Brian Knight Founder Pragmatic Works BIN207 About the Speaker Brian is a SQL Server MVP Founder of Pragmatic Works Co-founder of SQLServerCentral.com.
Helping Your Data Warehouse Succeed: 10 Mistakes to Avoid in Data Integration Rafael Salas w:
Pulling Data into the Model. Agenda Overview BI Development Studio Integration Services Solutions Integration Services Packages DTS to SSIS.
Practical MSBI(SSIS, SSAS,SSRS) online training. Contact Us: Call: Visit:
Template Package  Presented by G.Nagaraju.  What is Template Package?  Why we use Template Package?  Where we use Template Package?  How we create.
PROJECT ORIENTED ONLINE TRAINING ON MSBI (IS,AS,RS)
SSIS Templates, Configurations & Variables
Presented By: Jessica M. Moss
What Is The SSIS Catalog and Why Do I Care?
SQL Server Integration Services
Presented by: Warren Sifre
Exploring Azure Event Grid
BRK2279 Real-World Data Movement and Orchestration Patterns using Azure Data Factory Jason Horner, Attunix Cathrine Wilhelmsen, Inmeta -
Matt Masson Software Development Engineer Microsoft Corporation
From DTS to SSIS, Redesign or Upgrade
DAT381 Team Development with SQL Server 2005
Patterns and Best Practices in SSIS
Getting Data Where and When You Want it with SQL Server 2005
Just Enough SSIS Scripting to be Dangerous.
Presentation transcript:

SSIS Exploring Scalability, Performance and Deployment Vinod Kumar & Srinivas Sampath MVP – SQL Server

Presentation Scope A high level view Design considerations How to measure performance Performance implications of architecture Manageability aspects of SSIS Deployment tips Out of scope Prescriptive guidance for specific situations

Agenda Buffers and Memory OVAL Concept Detailed Component Specific Notes Manageability Features Deployment Considerations

Introduction

SSIS Life Cycle tools Design the SSIS Package Business Intelligence Studio (visual Studio) Migration wizard for pre SQL 2005 packages Version Control Integration (VSS) Deployment/Execution Deployment Utility to copy packages Command Line execution (dtexec.exe and dtexecui.exe) Flexible Configuration Options Supportability Rich per package Logging SQL Management Studio for monitoring running packages and organizing stored packages Checkpoint - Restartability

SSIS Tools SSIS packages packages BI Studio SSIS Service Mgt Studio Import Export WizardDeploymentInstaller File set Dtexec.exeDtexecui.exe Dtutil.exe execution View running and import\export deploy

Deep dive into Performance

Buffers and Memory Buffers based on design time metadata The width of a row determines the size of the buffer Smaller rows = more rows in memory = greater efficiency Memory copies are expensive! A buffer might have placeholder columns filled by downstream components Pointer magic where possible

Component Types Logically works at a row level Buffer Reused Data Convert, Derived Column Row based (synchronousoutputs) Partially Blocking (asynchronousoutputs) Blocking(asynchronousoutputs) May logically work at a row level Data copied to new buffers Merge, Merge Join, Union All Needs all input buffers before producing any output rows Data copied to new buffers Aggregate, Sort

CPU Utilization Execution Tree Starts from a source or an async output Ends at a destination or an input that has no sync outputs Each Execution Tree can get a worker thread MaxEngineThreads to control parallelism

Performance Strategy Use OVAL to identify the factors affecting data integration performance… O perations Which app is best suited to these operations on this volume of data? For example, use SQL Server or SSIS for sorting data? V olume A pplication L ocation How much data must be processed? What logic should be applied to the data? Where should the app run? For example, on a shared server, or on a standalone machine?

An OVAL Example— Loading a Text File Simple scenario… Interesting performance considerations! Text file on Server 1 SQL Server on Server 2

Understand all operations performed Operations Beware of hidden operations Data conversion in either step 3 or 4 1. Open a transaction on SQL Server 2. Read data from the text file 3. Load data into the SSIS data flow 4. Load the data into SQL Server 5. Commit the transaction

File Source Unnecessary data type conversions ‘FastParse’ in Flat File Source Unnecessary operations: E.g., converting from text to datetime, then from datetime to date Reduce database operations Database logging Commit size Fast Load Table lock Operations - Sharpen

Volume Reduce where possible Don’t push unneeded columns Conditional split for filtering rows Do not parse or convert columns unnecessarily In a fixed-width format you can combine adjacent unneeded columns into one Leave unneeded columns as strings

Volume - Sharpen Use appropriate data types An integer in the range takes 2 bytes as an integer, 3 bytes as a string, but 4 bytes as a real Suggest Types in the flat file connection manager UI Use parallelism If loading multiple files, can they be loaded in parallel?

Application Is SSIS right for this? Overhead of starting up an SSIS package may offset any performance gain over BCP for small data sets. Is BCP good enough? Is the greater manageability and control of SSIS needed? Bulk Import Task vs. Data Flow

Location Consider the following configuration … Text file on Server 1 SQL Server on Server 2 Where should SSIS run? (Licensing issues aside)

Location Considerations SSIS on Server 1 Competes with apps for resources Will data conversion on Server 1 reduce or increase the volume of data transferred across the network? Can not use the fast SSIS SQL Server Destination SSIS on Server 2 Competes with SQL Server for resources Will pulling text over conversion be expensive? Also consider transferring the file unparsed to Server 2 and read it locally from there Can use the fast SSIS SQL Server Destination

Measuring Performance OVAL does not provide prescriptive guidance Too many variables Improve performance by applying OVAL and measuring SSIS Logging Performance counters SQL Server Profiler For extract queries, lookups and loading

Parallelism Focus on critical path Utilize available resources Memory Constrained Reader and CPU Constrained Let it rip! Optimize the slowest

Moving Ahead

Manageability Features Logging and Log Providers Checkpoint Restartability Precedence Constraints Configurations SSIS Service

Logging and Log Providers Log entries are a blend of status and result messages Can select what ‘details’ per control flow object within each package (e.g. OnError, OnWarning, OnPreExecute) Can select what fields (e.g.computer, operator, ExecutionID…) Can define multiple log providers (SQL, text file, Windows Event..) per package

Checkpointing Checkpoint File Created Write Checkpoint Checkpoint File deleted Package LoadsPackage Completes Data Flow Task Send Mail Task

Configurations ‘Feed’ changes into a package and alter execution without editing the package directly (e.g. file name to load) The ‘feed’ can be sourced from a SQL table, XML file, Registry key, OS environment var, a Parent package. You can apply 1-many configuration sets per package and from a mix of sources

Configuration Scenario Dev DB Multiple Configurations Dev Test Production Test DBProd DB Machines where packages are being designed /tested /executed Configuration updates package on load with DB locations (and mail server, file share locations….) Package Handoff

Precedence constraints Directs Flow from object to object… Basically, ‘when do I move on’ Success, Failure, Completion or one of those plus an expression (condition) Dataflow Task SendMail Task Success Completion Failure Success & expression

Manageability Demo

Deployment Flow Tools to organize and ‘copy’ packages and supporting files Design Package Add Configurations Add Miscellaneous files Set Project Deployment properties Build Choose Destination (SQL File System) Modify protection level Choose location of supporting files Change configurations Execute Installation Wizard Bi Studio Copy/Move Deployment folder\files User Create desired agent jobs SQL Agent Copy/Move Deployment folder\filesUser

SQL Management Studio Utilizes the SSIS service Allows Monitoring of currently Executing packages Maintain stored package structure Ad hoc Package execution

Deployment Demo

Some more Tips LookupAggregateSortSwapping

Performance of Lookups The reference set Restrict to only those columns you actually use Restrict rows with WHERE if possible The lookup cache Caching can improve performance Full cache When the reference set will fit comfortably in memory Partial Build a cache as the input records are matched Useful for duplicate keys in the input, such as SKUs None Reference set doesn’t fit in memory and partial cache has no advantage

Performance of Aggregate Majority of work happens in ProcessInput call. This is on the thread in the previous execution tree! Memory requirements depend on how ‘deep’ the aggregations are Can reuse buckets if one agg can be derived from another Use when memory is limited, single threaded operation

Performance of Sort ProcessInput hangs on to the incoming data PrimeOutput does the sort and is the expensive part Sort needs all data to be in memory Sort can have unpredictable CPU requirements Merging is single threaded Stock Sort component will be good enough for most users Third party (“fastest sort in the world”) available if you really need it

Swapping buffers When physical memory is not available Each buffer gets written out to one file Multiple paths can be specified for swapping buffers BufferTempStoragePath property on the Pipeline Do everything in your power to avoid swapping Else, performance is really unpredictable Options: 64 bits, out of process execution, serializing operations

SSIS: Summary Fast ! Data flows process large volumes of data efficiently - even through complex operations Exceptional price / performance on multi-core Feature Rich Many pre-built adapters and transformations reduce hand coding Extensible object model enables specialized custom or scripted components Highly productive visual environment speeds development and debugging Integral part of a complete BI stack (IS-AS-RS) Beyond ETL Enables integration of XML, RSS and Web Services data Data cleansing features enable “difficult” data to be handled during loading Data and Text mining allow “smart” handling of data for imputation of incomplete data, conditional processing of potential problems, or smart escalation of issues such as fraud detection

Your Feedback is Important! Please Fill Out the feedback form

Questions !!!

Links & Resources Vinod Kumar, MVP-SQL Server, Intel Technology India Pvt. Ltd. SQL Server Integration Services public site ehouse/SSIS/default.aspx ehouse/SSIS/default.aspx SQL Server Business Intelligence public site on/bi/default.asp on/bi/default.asp SSIS MVPs community site Newsgroupsmicrosoft.private.sqlserver2005.dts Srinivas Sampath, MVP-SQL Server www32.brinkster.com/srisamp SCT Software Solutions

© 2005 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.