Download presentation
1
A Survival Guide Tim Mitchell
Real-World SSIS A Survival Guide Tim Mitchell
2
What we’ll cover today Lessons I’ve learned the hard way
Methodologies to solve real problems in SSIS Tools to help out Solutions for SQL 2012 as well as earlier versions Demos
3
What we won’t cover No intro to SSIS Books Online
4
Housekeeping Presentation materials Lunch / breaks
5
Housekeeping Let’s keep it informal Ask questions
6
PSA: Community Survival is easier in groups Local user groups
Events (SQL Saturday, SQL Bits, PASS Summit) Online communities Twitter (#sqlhelp)
7
About me Business intelligence consultant
Group Principal, Linchpin People SQL Server MVP TimMitchell.net
9
Texas Dictionary Whole mess: Bountiful amounts of something, usually referring to excess More than one way to skin a cat: A pet-unfriendly phrase to indicate that there are usually multiple ways to solve the same problem Ya’ll: A subgroup of the current group All ya’ll: The whole of the current group
10
Texas Dictionary I tell you what: A statement of strong belief in the preceding statement. May also be used to agree with someone else’s statement Bless [his/her/your] heart: A polite way to call someone an imbecile Dadgummit: A (generally) socially acceptable replacement for other, less socially acceptable words
11
Texas Dictionary Plumb: In a complete state of something. See also: flat out Yonder: Not here.
12
survival (noun) The state or fact of continuing to live or exist, typically in spite of an accident, ordeal, or difficult circumstances. Reference: Dictionary.com (
13
survival (noun) Survival is simply the state of existing. It’s just a small step above being dead. -- Me Photo credit: Elvis Ripley ( Used under Creative Commons license.
14
Elements of Survival The dangers: The elements Predators
Foolishness of fellow survivors The unexpected
15
Elements of Survival The dangers: Dirty data
Complex or poorly defined ETL requirements Unexpected metadata changes Unstable sources/destinations Project managers
16
Elements of Survival Means of survival:
Common sense of self preservation Tools Leaning on others Learning from others’ mistakes
17
Elements of Survival Means of survival: Best practices Consistency
Document Tools (buy/build) Community
18
Survival Tip #1: Plan to Fail
19
Planning to Fail
20
Planning to Fail Data failures: Missing or offline sources
Changed metadata Partial loads Validation issues Unexpected domain values
21
When X If it happens…
22
Planning to Fail Planning for failure in the wild:
Build your shelter before it rains Layers Leaves Bread crumbs
23
Planning to Fail Planning for failure, the ETL way: Be a pessimist!
Fail gracefully Capture error/warning data on failure Build for restartability (where appropriate)
24
Failing Gracefully
25
Planning to Fail Why graceful failure?
Avoid leaving affected systems in an inconsistent state Avoid repeating wholesale operations Timely notifications to allow proper response from dev/admin staff
26
Planning to Fail Graceful failures in SSIS Control flow: Data flow:
Event handlers Precedence constraints Data flow: Error row redirection Lookup failure redirection Conditional split
27
Planning to Fail Graceful failures in SSIS Restartability
SSIS Checkpoints SSIS transactions Both methods have shortcomings Custom restartability can be an option
28
Planning to Fail Natural failures Simply stop processing on error
Default behavior In some cases, can be the right pattern
29
Demo Designing for failure
30
Survival Tip #2: Take Notes
31
Take Notes What to note? Trails, paths, and shortcuts Water sources
Hazards Enemy positions Weather and wildlife patterns Sunrise/sunset time
32
Take Notes What to note? Success and failure of operations Row counts
Run times Validation information Warnings
33
Take Notes Why? Know what to expect Plan for growth Cover your assets
34
Take Notes It’s all about the log. SSIS logging SQL Server log
Custom logging
35
Take Notes SSIS Package Logging It’s already there Easy to start
Flexible events and destinations Can be unwieldy
36
Take Notes SSIS Catalog Logging Version 2012 only Easiest to configure
Design time or runtime Least flexible
37
Take Notes Custom Logging Roll your own Most difficult to set up
Infinitely flexible
38
Take Notes Server/engine logging SQL Engine error log DMVs
Third party tools Windows log PerfMon
39
Demo Take Notes
40
Survival Tip #3: Perform at your best
41
Perform at your Best
42
Perform at your Best Soldier up! Recognize and avoid quicksand
React appropriately when you’re stuck Know your environment
43
Perform at your Best Soldier up!
Isolate and eliminate the things that slow you down Recognize design patterns that are detrimental to performance Look *outside* SSIS (gasp!)
44
Perform at your Best It’s not just SSIS
The majority of SSIS performance problems have nothing to do with SSIS Limitations on sources and destinations
45
Perform at your Best It’s not just SSIS
Don’t just ‘pass the buck’, but do consider other factors: SQL engine configuration Disk configuration Network speed/latency Physical machine capabilities
46
Perform at your Best It’s not just SSIS
Proper query techniques for relational sources Effective indexing for sources and destinations Using OPTION (FAST <n>)
47
Perform at your Best Streamline your data flows
Transformations matter! Know how the blocking properties of transformations
48
Perform at your Best Streamline your data flows
Nonblocking transforms do not hold buffers Derived Column Conditional Split Row Count
49
Perform at your Best Streamline your data flows
Partially blocking transforms will queue up buffers as needed Merge Join Lookup Union All
50
Perform at your Best Streamline your data flows
Fully blocking transforms will not pass any data through until all of the data has been buffered at that transformation Sort Aggregate
51
Perform at your Best Streamline your data flows
Be aware of memory use! LOB (large object) columns will always spool to disk rather than staying in memory. [N]VARCHAR(MAX) Memory buffers may spill over to disk
52
Perform at your Best Streamline your data flows Manage your sources
Don’t use table drop down list – specify your query including only the necessary columns Be mindful of indexes when writing data retrieval queries
53
Perform at your Best Streamline your data flows
Manage your destinations Use FAST LOAD for SQL Server destinations Index management (drop?)
54
Perform at your Best Go Parallel!
Parallel operations can yield faster data flows
55
Demo Parallel data flow
56
Perform at your Best Streamline your data flows Using lookups
Pay attention to lookup cache mode Full cache Partial cache No cache
57
Perform at your Best Streamline your data flows Using lookups
Two-phase lookup strategy: Commonly accessed data in full cache Remaining data in a subsequent partial cache
58
Perform at your Best Streamline your data flows Using lookups
Cache connection manager Allow reuse of lookup information across data flows
59
Demo Lookups
60
Survival Tip #4: Clean it up
61
Clean it up The greatest danger is in the elements
Chances are that unsanitary conditions will kill you before a predator does Infection Spoiled food or water
62
Clean it up In ETL, the greatest dangers often lie in the small things
Like an infection, bad data can fester for a while until it’s too late Caught early, problems with dirty data are more easily solved
63
What is dirty data? Types of dirty data: Data type mismatches
Domain violations Semantic violations Technical errors Simple inaccuracies
64
What is dirty data? Data type mismatches
Non-numeric data in numeric fields Decimal data in integer fields Incorrect precision / rounding Truncation
65
What is dirty data? Domain violations Invalid dates
Incorrect addresses Semantic violations Data outside of a reasonable range (such as a person’s age in the thousands of years) Inconsistent use of NULL, blanks, and zeroes
66
What is dirty data? Technical errors Improperly formatted dates
Out-of-alignment flat files Too many/too few delimiters
67
What is dirty data? Simple inaccuracies Misspellings Duplications
Improper formatting ( addresses, phone numbers) Case
68
What causes dirty data?
69
What is dirty data? Causes of dirty data: Internal:
Unvalidated user input Lack of proper database constraints and/or application logic External: Import bad data from other systems ETL errors Import bad data from other systems – The remainder of the discussion and our demos focus on this segment
70
Now What?
71
Clean it up Test your cleansing logic in stage/test/QA first
Cleanse directly in production Don’t cleanse at all
72
Clean it up What to do with unresolvable bad data? Delete
Update to NULL or unknown member Mark as suspect Write to triage Stop the ETL
73
Data Cleansing in SSIS
74
Data Cleansing in SSIS Tools of the trade Native SSIS components
POTS (Plain Old Transact-SQL) SQL Server DQS
75
Data Cleansing in SSIS SSIS Native Components
A versatile approach with more transformation options A much better choice when data cleansing operations involve multiple and/or non-SQL Server data sources Extensible through custom code Third party add-ons
76
Data Cleansing in SSIS SSIS Native Components
Precision tools include Lookup Transformation, Merge Join Flexible/inexact cleansing through Conditional Split, Derived Columns transformation, fuzzy tools
77
Data Cleansing in SSIS Transact-SQL
Fast, simple, effective way to do some cleanup operations Requires no additional software or configuration Extensible through the use of UDFs or CLR functions
78
Data Cleansing in SSIS Data Quality Services
A tool specifically designed for data cleansing Has its own client interface, or can be used within SSIS for cleansing operations Limited set of operations in SSIS
79
Demo Data Cleansing
80
Survival Tip #5: The Swiss Army Knife
81
Swiss Army Knife When unexpected situations arise, an all-purpose tool can literally be a lifesaver. Cut up small firewood Can opener Make a game trap
82
Swiss Army Knife Scripting and coding tools SSIS Expressions
Script task/script component PowerShell
83
Swiss Army Knife SSIS Expressions Built into SSIS
Can be used in most any component or task No extra moving parts required Useful for declarative statements
84
Swiss Army Knife Pros: Easy to get started – just start expressing yourself Ubiquity Relatively easy to use
85
Swiss Army Knife Cons: Syntax is <polite> unique </polite>
Complex expressions are difficult Troubleshooting
86
Swiss Army Knife SSIS Scripting .NET Framework VB.NET or C#
Can use existing external assemblies
87
Swiss Army Knife Pros: Swiss Army knife of SSIS
Works great for operations where native SSIS tasks/components can’t easily accomplish goal Does not require in-depth programming knowledge
88
Swiss Army Knife Cons: Does require some familiarity with programming or scripting Not as simple as native components Performance (sometimes)
89
Swiss Army Knife Script Task Used in the Control Flow Variety of uses:
Interact with OS Filesystem operations (archiving) Manipulate SSIS variables Call external programs
90
Swiss Army Knife Script Component Data Flow pane
Data flow/manipulation Used for: Data manipulation in the pipeline that can’t be accomplished otherwise Advanced branching logic Shred unconventional input files Create custom output files
91
Swiss Army Knife Script Component Synchronous or asynchronous Types
Source Transformation Destination
92
Swiss Army Knife Semi-structured files Nonlinear files
Multiple lines of text per output row Varying number of columns Dissimilar data types “Record Type” format
93
Other Scripting Uses Wait for a file or connection to be available
Set and enforce thresholds for maximum execution time Custom logging Custom notifications Cross-package variable sharing ?????
94
Expressions and scripting
Demo Expressions and scripting
95
Know what’s coming next
Survival Tip #6: Know what’s coming next
96
Know what’s coming next
Survivors keep an eye on what to expect in the days/months/years ahead Weather forecasts Changing of seasons Wildlife patterns
97
Know what’s coming next
Know the technical/business landscape New versions of software Emerging design patterns
98
What’s new in SSIS for SQL Server 2012
99
Logging Changes Back in the day…
Logging configured at the package level Inconsistent Difficult to add logging afterward
100
Logging Changes … and now:
Logging is configured at the server level (SSIS catalog) Can be added, changed, or removed at runtime
101
Logging Changes … and now: Logging levels:
Basic Performance Verbose None Native row count logging (and everyone said “Amen”) Logs to table in SSISDB
102
Logging Changes … and now: Built-in reports Included with SSMS
Detail and aggregate data ETL Head-to-Head: T-SQL vs. SSIS
103
Undo/Redo When I was your age… Package changes are immediate
Undo = close without saving
104
Undo/Redo … and now Full support of Undo and Redo in the designer
105
Package Parameters Prior Versions:
Sharing of values between packages required the inheritance of parent package variables Parent packages had no knowledge of expected variables in child packages
106
Package Parameters Prior Versions:
There was no practical way to configure variables as required (other than failing the package)
107
Package Parameters SQL Server 2012: Package parameters!
Required or optional Accessible through the Execute Package Task in parent packages
108
Package Parameters
109
DQS and SSIS Then: Data quality routines were everywhere, but also completely manual No standard means of implementation
110
DQS and SSIS Now: SSIS has a transformation to leverage DQS (also new) for data cleansing operations Consumes reusable knowledge base data for reliable, consistent cleansing
111
DQS and SSIS
112
Flat File Improvements
Old school: Irregularly shaped flat files could not be natively processed in SSIS Scripting was usually required to process
113
Flat File Improvements
New school: New flat file connection allows native processing of files with missing columns
114
Flat File Improvements
ETL Head-to-Head: T-SQL vs. SSIS
115
Shared Data Sources In days of yore:
“Shared” connections meant configuring a connection in each package, and using package configs for the connection string Still requires setting up and maintaining connections at the package level
116
Shared Data Sources Here and now:
Native shared connections allow SSIS projects to use connections common to the entire project Package-level connections still supported
117
Shared Data Sources
118
Script Component Debugging
Remember when: MessageBox.Show()
119
Script Component Debugging
… and now: Integrated debugging in the script component Step through code line by line to find issues and test
120
Script Component Debugging
Demo Script Component Debugging
121
Name-based metadata mapping
Then: Changing upstream components often causes runtime errors in downstream components The longest 4-letter word in the English language: VS_NEEDSNEWMETADATA
122
Name-based metadata mapping
Now: Metadata mapping is based on name Easier to remap upstream components
123
Name-based metadata mapping
Demo Name-based metadata mapping
124
CDC in SSIS The old: CDC (Change Data Capture) was present in the DB engine, but required manual T-SQL coding to implement
125
CDC in SSIS The new: SSIS now has new task and components to handle CDC processing CDC Task – metadata (start/end initial load, etc.) CDC Source – retrieve CDC data CDC Splitter – break apart results
126
Environments Environment replace configurations
Collections of related values (ex: Production connection strings, Dev connection strings, etc.) Multiple environments can be associated with each project or package Specify for automated job, or easily choose at runtime
127
10+. Designer Improvements
Package annotations In prior versions, annotations were difficult SSIS 2012 improvements
128
Designer Improvements
Sort packages by name Sometimes it’s the little things that matter
129
Designer Improvements
Simplified data viewer
130
Designer Improvements
Universal status indicators
131
Designer Improvements
Variable management Scope default Expression management Static values vs. expression Expression indicator
132
Survival Tip #7: Have a bag of tricks
133
Have a bag of tricks Be lazy! Code once, reuse many
Create a portable system for reusing familiar patterns Database? Documentation?
134
Have a bag of tricks Be lazy! ETL Framework
Managed execution for multipackage ETL processes Restartability, consolidated error handling and logging
135
Have a bag of tricks Be lazy! Custom SSIS components
Create custom components for commonly used design patterns Parameterized script packages may substitute in SQL 2012
136
Have a bag of tricks Be lazy! Third party tools BIDS Helper
SSIS Reporting Pack SQL Sentry Plan Explorer Brent Ozar’s SQLBlitz
137
Have a bag of tricks Be lazy! Biml
Business Intelligence Markup Language Package generation tool Included with BIDS Helper (free)
138
Biml package generation
Demo Biml package generation
139
Questions? Comments? Standing ovation?
140
Thanks! TimMitchell.net @Tim_Mitchell tdmitch@gmail.com
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.