2 Overview of SSIS performance Troubleshooting methods Performance tips.

Slides:



Advertisements
Similar presentations
Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
Advertisements

SSIS Dataflow Performance Tuning 1 st October 2010 Jamie Thomson.
Adam Jorgensen Pragmatic Works Performance Optimization in SQL Server Analysis Services 2008.
Module 8 Importing and Exporting Data. Module Overview Transferring Data To/From SQL Server Importing & Exporting Table Data Inserting Data in Bulk.
Introduction to ETL Using Microsoft Tools By Dr. Gabriel.
EXECUTION PLANS By Nimesh Shah, Amit Bhawnani. Outline  What is execution plan  How are execution plans created  How to get an execution plan  Graphical.
MCTS GUIDE TO MICROSOFT WINDOWS 7 Chapter 10 Performance Tuning.
Chapter 14 Chapter 14: Server Monitoring and Optimization.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Top 10 SSIS Best Practices Tim Mitchell Artis Consulting The World’s Largest Community of SQL Server Professionals.
Module 8: Monitoring SQL Server for Performance. Overview Why to Monitor SQL Server Performance Monitoring and Tuning Tools for Monitoring SQL Server.
SQL Server Integration Services 2008 &2012
Performance Tuning SSIS. HR Departments are no fun. Don’t mention the stalking incident with Clay Aiken What happened in Vegas My prom date with a puppet.
SSIS Over DTS Sagayaraj Putti (139460). 5 September What is DTS?  Data Transformation Services (DTS)  DTS is a set of objects and utilities that.
Troubleshooting SQL Server Enterprise Geodatabase Performance Issues
Performance Tuning Cubes and Queries in Analysis Services 2008 Chris Webb
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
MCTS Guide to Microsoft Windows 7
Workflow Manager and General Tuning Tips. Topics to discuss… Working with Workflows Working with Tasks General Tuning Tips.
ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Loading Ola Ekdahl IT Mentors 9/12/08.
Performance Concepts Mark A. Magumba. Introduction Research done on 1058 correspondents in 2006 found that 75% OF them would not return to a website that.
Oracle Challenges Parallelism Limitations Parallelism is the ability for a single query to be run across multiple processors or servers. Large queries.
1 Robert Wijnbelt Health Check your Database A Performance Tuning Methodology.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Lecture Set 14 B new Introduction to Databases - Database Processing: The Connected Model (Using DataReaders)
Module 7 Reading SQL Server® 2008 R2 Execution Plans.
Ideas to Improve SharePoint Usage 4. What are these 4 Ideas? 1. 7 Steps to check SharePoint Health 2. Avoid common Deployment Mistakes 3. Analyze SharePoint.
Module 5 Planning for SQL Server® 2008 R2 Indexing.
IT 456 Seminar 5 Dr Jeffrey A Robinson. Overview of Course Week 1 – Introduction Week 2 – Installation of SQL and management Tools Week 3 - Creating and.
ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Cleansing Ola Ekdahl IT Mentors 9/12/08.
Learningcomputer.com SQL Server 2008 – Profiling and Monitoring Tools.
Integration Services in SQL Server 2008 Allan Mitchell SQL Server MVP.
Parallel Execution Plans Joe Chang
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
1 Integration Services in SQL Server 2008 Allan Mitchell – SQLBits – Oct 2007.
1 Advanced Topics Using Microsoft SQL Server 2005 Integration Services Allan Mitchell – SQLBits – Oct 2007.
1 Chapter 13 Parallel SQL. 2 Understanding Parallel SQL Enables a SQL statement to be: – Split into multiple threads – Each thread processed simultaneously.
Building Data Integration Solutions with Integration Services Donald Farmer Group Program Manager Microsoft Corporation.
B+ Trees: An IO-Aware Index Structure Lecture 13.
MISSION CRITICAL COMPUTING Siebel Database Considerations.
IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida Physical Database Implementation—Topics.
20 Copyright © 2008, Oracle. All rights reserved. Cache Management.
SSIS – Deep Dive Praveen Srivatsa Director, Asthrasoft Consulting Microsoft Regional Director | MVP.
Best Practices in Loading Large Datasets Asanka Padmakumara (BSc,MCTS) SQL Server Sri Lanka User Group Meeting Oct 2013.
How to kill SQL Server Performance Håkan Winther.
Scott Fallen Sales Engineer, SQL Sentry Blog: scottfallen.blogspot.com.
What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently and safely. Provide.
Diving into Query Execution Plans ED POLLACK AUTOTASK CORPORATION DATABASE OPTIMIZATION ENGINEER.
Honest Bob’s Cube Processing Bob Duffy Database Architect.
DBI317. Enhanced Usability Improved Deployment, Configuration and Management Improved deployment, configuration and management of SSIS projects  SSIS.
Practical Hadoop: do’s and don’ts by example Kacper Surdy, Zbigniew Baranowski.
Hitting the SQL Server “Go Faster” Button Rob Douglas #509 | Brisbane 2016.
SQL Database Management
Dissecting the Data Flow: SSIS Transformations, Memory & the Pipeline
Data Warehouse ETL By Garrett EDmondson Thanks to our Gold Sponsors:
Power BI Performance Tips & Tricks
Design Patterns for SSIS Performance
Antonio Abalos Castillo
Indices.
Hitting the SQL Server “Go Faster” Button
SQL Server Internals Overview
Informatica PowerCenter Performance Tuning Tips
Presented by: Warren Sifre
Hitting the SQL Server “Go Faster” Button
Designing SSIS Packages for Performance
Data Analysis with SQL Window Functions
Four Rules For Columnstore Query Performance
Patterns and Best Practices in SSIS
Using wait stats to determine why my server is slow
Presentation transcript:

2 Overview of SSIS performance Troubleshooting methods Performance tips

3 Business intelligence consultant Partner, Linchpin People SQL Server MVP TimMitchell.net

4 Questions Slide deck 4

5

6 Two most common questions about SSIS package executions: Did it complete successfully? How long did it run?

7 Why is ETL performance important? Getting data to the right people in a timely manner Load/maintenance window Potentially overlapping ETL cycles The most important concern: bragging rights!

8

9 Key questions: Is performance really a concern? Is it really an SSIS issue? Which task(s) or component(s) are causing the bottleneck?

10 Is this really an SSIS issue? Test independently, outside SSIS Compare results to SSIS

11 Where in SSIS is the bottleneck? Logging is critical Package logging (legacy logging) Catalog logging (SQL 2012 only)

12 SELECT execution_id, package_name, task_name, subcomponent_name, phase, MIN(start_time) [Start_Time], MAX(end_time) [End_Time] FROM catalog.execution_component_phases WHERE execution_id = (SELECT MAX(execution_id) from [catalog].[execution_data_statistics]) GROUP BY execution_id, package_name, task_name, subcomponent_name, phase ORDER BY 6

13 Brute force troubleshooting: Isolation by elimination Disable tasks Remove components

14 Monitor system metrics Disk IO Memory CPU Network

15

16 Many performance problems in SSIS aren’t SSIS problems Sources and destination issues are often to blame

17 Improper data retrieval queries Index issues Network speed/latency Disk I/O

18 Directs the query to return the first rows as quickly as possible SELECT FirstName, LastName, Address, City, State, Zip FROM dbo.People OPTION (FAST 10000) Not intended to improve the overall performance of the query

19 Useful for packages that spend a lot of time processing data in data flow Query time from SQL Server source Processing time in SSIS package Load time to destination Without OPTION (FAST )

20 Useful for packages that spend a lot of time processing data in data flow Processing time in SSIS package Load time to destination Query time from SQL Server source Using OPTION (FAST )

21

22 Know the blocking properties of transformations Nonblocking Partially blocking Fully blocking

23 Nonblocking – no holding buffers Row count Derived column transformation Conditional split

24 Partially blocking – some buffers can be held Merge Join Lookup Union All

25 Fully blocking – everything stops until all data is received Sort Aggregate Fuzzy grouping/lookup

26 Partially or fully blocking transforms are not evil! Pick the right tool for every job.

27 Demo – Blocking Transformations

28 Some transformations are often slow by nature Slowly Changing Dimension wizard OleDB Command

29 Useful for certain scenarios, but should not be considered go-to tools for transforming data

30 Table list = SELECT * FROM… Can result in unnecessary columns A narrow buffer is happy buffer

31 Use the query window to select from table, specifying only the required columns Writing the query can be a reminder to apply filtering via WHERE clause, if appropriate

32 Data flow setting to prevent the allocation of memory from unused columns Note that you’ll still get a warning from SSIS engine

33 When dealing with relational data, consider transforming in source Let the database engine do what it already does well

34 Sorting Lookups/joins Aggregates

35 Package-level setting to specify how many executables can be running at once Default = -1 Number of logical processors + 2

36 For machines with few logical processors or potentially many concurrent executables, consider increasing this value

37 Demo – Concurrent Executables

38 Many operations in SSIS are done serially For nondependent operations, consider allowing processes to run in parallel

39 Dependent on machine configuration, network environment, etc. Can actually *hurt* performance Testing, testing, testing!

40 Sometimes a non-SSIS solution will perform better than SSIS If all you have is a hammer…

41 Some operations are better suited for T-SQL or other tools: MERGE upsert INSERT…SELECT Third party components External applications (via Execute Process Task)

42 Buffers spooled = writing to physical disk Usually indicates memory pressure Keep an eye on PerfMon counters for SSIS: Buffers Spooled

43

44 Any value above zero should be investigated Keep it in memory!

45 Data flow task values: DefaultBufferSize DefaultBufferMaxRows Ideally, use a small number of large buffers Generally, max number of active buffers = 5

46 Buffer size calculation: Row size * est num of rows / DefaultMaxBufferRows

47 OleDbDestination setting Controls how buffers are committed to the destination database

48 MICS > buffer sizeSetting is ignored. One commit is issued for every buffer. MICS = 0The entire batch is committed in one big batch. MICS < buffer sizeCommit is issued every time MICS rows are sent. Commits are also issued at the end of each buffer. Source: Data Loading Performance Guide

49 Note that this does NOT impact the size of the buffer in SSIS

50 In most cases, a small number of large commits is preferable to a large number of small commits

51 Larger commit sizes = fewer commits Good: Potentially less database overhead Potentially less index fragmentation Bad: Potential for log file or TempDB pressure/growth

52 Plan for no paging of buffers to disk. But…. Build for it if (when?) it happens BLOBTempStoragePath BufferTempStoragePath Fast disks if possible

53 Lookup cache modes Full (default) Partial None

54 Full cache: Small- to medium-size lookup table Large set of data to be validated against the lookup table Expect multiple “hits” per result

55 Partial cache: Large lookup table AND reasonable number of rows from the main source Expect multiple “hits” per result

56 No cache: Small lookup table Expect only one “hit” per result

57 Demo – Lookups and caching

58 Strategically schedule packages to avoid contention with other packages, external processes, etc. Consider a workpile pattern in SSIS

59 Resource contention is a key player in SSIS performance issues Other SSIS or ETL processes External processes

60 SQL-to-SQL operations perform well Can’t do direct SQL to SQL with nonrelational or external data

61 Staging the data can allow for faster, direct SQL operations Remember: Let the database engine do what it does well. Updates in particular

62 Demo – Staged Update

63 Additional Resources Rob Farley demonstrates FAST hint: Microsoft data loading performance guide:

64 TimMitchell.net/Newsletter