Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Survival Guide Tim Mitchell

Similar presentations


Presentation on theme: "A Survival Guide Tim Mitchell"— Presentation transcript:

1 A Survival Guide Tim Mitchell
Real-World SSIS A Survival Guide Tim Mitchell

2 What we’ll cover today Lessons I’ve learned the hard way
Methodologies to solve real problems in SSIS Tools to help out Solutions for SQL 2012 as well as earlier versions Demos

3 What we won’t cover No intro to SSIS Books Online

4 Housekeeping Presentation materials Lunch / breaks

5 Housekeeping Let’s keep it informal Ask questions

6 PSA: Community Survival is easier in groups Local user groups
Events (SQL Saturday, SQL Bits, PASS Summit) Online communities Twitter (#sqlhelp)

7 About me Business intelligence consultant
Group Principal, Linchpin People SQL Server MVP TimMitchell.net

8

9 Texas Dictionary Whole mess: Bountiful amounts of something, usually referring to excess More than one way to skin a cat: A pet-unfriendly phrase to indicate that there are usually multiple ways to solve the same problem Ya’ll: A subgroup of the current group All ya’ll: The whole of the current group

10 Texas Dictionary I tell you what: A statement of strong belief in the preceding statement. May also be used to agree with someone else’s statement Bless [his/her/your] heart: A polite way to call someone an imbecile Dadgummit: A (generally) socially acceptable replacement for other, less socially acceptable words

11 Texas Dictionary Plumb: In a complete state of something. See also: flat out Yonder: Not here.

12 survival (noun) The state or fact of continuing to live or exist, typically in spite of an accident, ordeal, or difficult circumstances. Reference: Dictionary.com (http://dictionary.reference.com/browse/survival)

13 survival (noun) Survival is simply the state of existing. It’s just a small step above being dead. -- Me Photo credit: Elvis Ripley (http://www.flickr.com/photos/elvisripley/ /). Used under Creative Commons license.

14 Elements of Survival The dangers: The elements Predators
Foolishness of fellow survivors The unexpected

15 Elements of Survival The dangers: Dirty data
Complex or poorly defined ETL requirements Unexpected metadata changes Unstable sources/destinations Project managers

16 Elements of Survival Means of survival:
Common sense of self preservation Tools Leaning on others Learning from others’ mistakes

17 Elements of Survival Means of survival: Best practices Consistency
Document Tools (buy/build) Community

18 Survival Tip #1: Plan to Fail

19 Planning to Fail

20 Planning to Fail Data failures: Missing or offline sources
Changed metadata Partial loads Validation issues Unexpected domain values

21 When X If it happens…

22 Planning to Fail Planning for failure in the wild:
Build your shelter before it rains Layers Leaves Bread crumbs

23 Planning to Fail Planning for failure, the ETL way: Be a pessimist!
Fail gracefully Capture error/warning data on failure Build for restartability (where appropriate)

24 Failing Gracefully

25 Planning to Fail Why graceful failure?
Avoid leaving affected systems in an inconsistent state Avoid repeating wholesale operations Timely notifications to allow proper response from dev/admin staff

26 Planning to Fail Graceful failures in SSIS Control flow: Data flow:
Event handlers Precedence constraints Data flow: Error row redirection Lookup failure redirection Conditional split

27 Planning to Fail Graceful failures in SSIS Restartability
SSIS Checkpoints SSIS transactions Both methods have shortcomings Custom restartability can be an option

28 Planning to Fail Natural failures Simply stop processing on error
Default behavior In some cases, can be the right pattern

29 Demo Designing for failure

30 Survival Tip #2: Take Notes

31 Take Notes What to note? Trails, paths, and shortcuts Water sources
Hazards Enemy positions Weather and wildlife patterns Sunrise/sunset time

32 Take Notes What to note? Success and failure of operations Row counts
Run times Validation information Warnings

33 Take Notes Why? Know what to expect Plan for growth Cover your assets

34 Take Notes It’s all about the log. SSIS logging SQL Server log
Custom logging

35 Take Notes SSIS Package Logging It’s already there Easy to start
Flexible events and destinations Can be unwieldy

36 Take Notes SSIS Catalog Logging Version 2012 only Easiest to configure
Design time or runtime Least flexible

37 Take Notes Custom Logging Roll your own Most difficult to set up
Infinitely flexible

38 Take Notes Server/engine logging SQL Engine error log DMVs
Third party tools Windows log PerfMon

39 Demo Take Notes

40 Survival Tip #3: Perform at your best

41 Perform at your Best

42 Perform at your Best Soldier up! Recognize and avoid quicksand
React appropriately when you’re stuck Know your environment

43 Perform at your Best Soldier up!
Isolate and eliminate the things that slow you down Recognize design patterns that are detrimental to performance Look *outside* SSIS (gasp!)

44 Perform at your Best It’s not just SSIS
The majority of SSIS performance problems have nothing to do with SSIS Limitations on sources and destinations

45 Perform at your Best It’s not just SSIS
Don’t just ‘pass the buck’, but do consider other factors: SQL engine configuration Disk configuration Network speed/latency Physical machine capabilities

46 Perform at your Best It’s not just SSIS
Proper query techniques for relational sources Effective indexing for sources and destinations Using OPTION (FAST <n>)

47 Perform at your Best Streamline your data flows
Transformations matter! Know how the blocking properties of transformations

48 Perform at your Best Streamline your data flows
Nonblocking transforms do not hold buffers Derived Column Conditional Split Row Count

49 Perform at your Best Streamline your data flows
Partially blocking transforms will queue up buffers as needed Merge Join Lookup Union All

50 Perform at your Best Streamline your data flows
Fully blocking transforms will not pass any data through until all of the data has been buffered at that transformation Sort Aggregate

51 Perform at your Best Streamline your data flows
Be aware of memory use! LOB (large object) columns will always spool to disk rather than staying in memory. [N]VARCHAR(MAX) Memory buffers may spill over to disk

52 Perform at your Best Streamline your data flows Manage your sources
Don’t use table drop down list – specify your query including only the necessary columns Be mindful of indexes when writing data retrieval queries

53 Perform at your Best Streamline your data flows
Manage your destinations Use FAST LOAD for SQL Server destinations Index management (drop?)

54 Perform at your Best Go Parallel!
Parallel operations can yield faster data flows

55 Demo Parallel data flow

56 Perform at your Best Streamline your data flows Using lookups
Pay attention to lookup cache mode Full cache Partial cache No cache

57 Perform at your Best Streamline your data flows Using lookups
Two-phase lookup strategy: Commonly accessed data in full cache Remaining data in a subsequent partial cache

58 Perform at your Best Streamline your data flows Using lookups
Cache connection manager Allow reuse of lookup information across data flows

59 Demo Lookups

60 Survival Tip #4: Clean it up

61 Clean it up The greatest danger is in the elements
Chances are that unsanitary conditions will kill you before a predator does Infection Spoiled food or water

62 Clean it up In ETL, the greatest dangers often lie in the small things
Like an infection, bad data can fester for a while until it’s too late Caught early, problems with dirty data are more easily solved

63 What is dirty data? Types of dirty data: Data type mismatches
Domain violations Semantic violations Technical errors Simple inaccuracies

64 What is dirty data? Data type mismatches
Non-numeric data in numeric fields Decimal data in integer fields Incorrect precision / rounding Truncation

65 What is dirty data? Domain violations Invalid dates
Incorrect addresses Semantic violations Data outside of a reasonable range (such as a person’s age in the thousands of years) Inconsistent use of NULL, blanks, and zeroes

66 What is dirty data? Technical errors Improperly formatted dates
Out-of-alignment flat files Too many/too few delimiters

67 What is dirty data? Simple inaccuracies Misspellings Duplications
Improper formatting ( addresses, phone numbers) Case

68 What causes dirty data?

69 What is dirty data? Causes of dirty data: Internal:
Unvalidated user input Lack of proper database constraints and/or application logic External: Import bad data from other systems ETL errors Import bad data from other systems – The remainder of the discussion and our demos focus on this segment

70 Now What?

71 Clean it up Test your cleansing logic in stage/test/QA first
Cleanse directly in production Don’t cleanse at all

72 Clean it up What to do with unresolvable bad data? Delete
Update to NULL or unknown member Mark as suspect Write to triage Stop the ETL

73 Data Cleansing in SSIS

74 Data Cleansing in SSIS Tools of the trade Native SSIS components
POTS (Plain Old Transact-SQL) SQL Server DQS

75 Data Cleansing in SSIS SSIS Native Components
A versatile approach with more transformation options A much better choice when data cleansing operations involve multiple and/or non-SQL Server data sources Extensible through custom code Third party add-ons

76 Data Cleansing in SSIS SSIS Native Components
Precision tools include Lookup Transformation, Merge Join Flexible/inexact cleansing through Conditional Split, Derived Columns transformation, fuzzy tools

77 Data Cleansing in SSIS Transact-SQL
Fast, simple, effective way to do some cleanup operations Requires no additional software or configuration Extensible through the use of UDFs or CLR functions

78 Data Cleansing in SSIS Data Quality Services
A tool specifically designed for data cleansing Has its own client interface, or can be used within SSIS for cleansing operations Limited set of operations in SSIS

79 Demo Data Cleansing

80 Survival Tip #5: The Swiss Army Knife

81 Swiss Army Knife When unexpected situations arise, an all-purpose tool can literally be a lifesaver. Cut up small firewood Can opener Make a game trap

82 Swiss Army Knife Scripting and coding tools SSIS Expressions
Script task/script component PowerShell

83 Swiss Army Knife SSIS Expressions Built into SSIS
Can be used in most any component or task No extra moving parts required Useful for declarative statements

84 Swiss Army Knife Pros: Easy to get started – just start expressing yourself Ubiquity Relatively easy to use

85 Swiss Army Knife Cons: Syntax is <polite> unique </polite>
Complex expressions are difficult Troubleshooting

86 Swiss Army Knife SSIS Scripting .NET Framework VB.NET or C#
Can use existing external assemblies

87 Swiss Army Knife Pros: Swiss Army knife of SSIS
Works great for operations where native SSIS tasks/components can’t easily accomplish goal Does not require in-depth programming knowledge

88 Swiss Army Knife Cons: Does require some familiarity with programming or scripting Not as simple as native components Performance (sometimes)

89 Swiss Army Knife Script Task Used in the Control Flow Variety of uses:
Interact with OS Filesystem operations (archiving) Manipulate SSIS variables Call external programs

90 Swiss Army Knife Script Component Data Flow pane
Data flow/manipulation Used for: Data manipulation in the pipeline that can’t be accomplished otherwise Advanced branching logic Shred unconventional input files Create custom output files

91 Swiss Army Knife Script Component Synchronous or asynchronous Types
Source Transformation Destination

92 Swiss Army Knife Semi-structured files Nonlinear files
Multiple lines of text per output row Varying number of columns Dissimilar data types “Record Type” format

93 Other Scripting Uses Wait for a file or connection to be available
Set and enforce thresholds for maximum execution time Custom logging Custom notifications Cross-package variable sharing ?????

94 Expressions and scripting
Demo Expressions and scripting

95 Know what’s coming next
Survival Tip #6: Know what’s coming next

96 Know what’s coming next
Survivors keep an eye on what to expect in the days/months/years ahead Weather forecasts Changing of seasons Wildlife patterns

97 Know what’s coming next
Know the technical/business landscape New versions of software Emerging design patterns

98 What’s new in SSIS for SQL Server 2012

99 Logging Changes Back in the day…
Logging configured at the package level Inconsistent Difficult to add logging afterward

100 Logging Changes … and now:
Logging is configured at the server level (SSIS catalog) Can be added, changed, or removed at runtime

101 Logging Changes … and now: Logging levels:
Basic Performance Verbose None Native row count logging (and everyone said “Amen”) Logs to table in SSISDB

102 Logging Changes … and now: Built-in reports Included with SSMS
Detail and aggregate data ETL Head-to-Head: T-SQL vs. SSIS

103 Undo/Redo When I was your age… Package changes are immediate
Undo = close without saving

104 Undo/Redo … and now Full support of Undo and Redo in the designer

105 Package Parameters Prior Versions:
Sharing of values between packages required the inheritance of parent package variables Parent packages had no knowledge of expected variables in child packages

106 Package Parameters Prior Versions:
There was no practical way to configure variables as required (other than failing the package)

107 Package Parameters SQL Server 2012: Package parameters!
Required or optional Accessible through the Execute Package Task in parent packages

108 Package Parameters

109 DQS and SSIS Then: Data quality routines were everywhere, but also completely manual No standard means of implementation

110 DQS and SSIS Now: SSIS has a transformation to leverage DQS (also new) for data cleansing operations Consumes reusable knowledge base data for reliable, consistent cleansing

111 DQS and SSIS

112 Flat File Improvements
Old school: Irregularly shaped flat files could not be natively processed in SSIS Scripting was usually required to process

113 Flat File Improvements
New school: New flat file connection allows native processing of files with missing columns

114 Flat File Improvements
ETL Head-to-Head: T-SQL vs. SSIS

115 Shared Data Sources In days of yore:
“Shared” connections meant configuring a connection in each package, and using package configs for the connection string Still requires setting up and maintaining connections at the package level

116 Shared Data Sources Here and now:
Native shared connections allow SSIS projects to use connections common to the entire project Package-level connections still supported

117 Shared Data Sources

118 Script Component Debugging
Remember when: MessageBox.Show()

119 Script Component Debugging
… and now: Integrated debugging in the script component Step through code line by line to find issues and test

120 Script Component Debugging
Demo Script Component Debugging

121 Name-based metadata mapping
Then: Changing upstream components often causes runtime errors in downstream components The longest 4-letter word in the English language: VS_NEEDSNEWMETADATA

122 Name-based metadata mapping
Now: Metadata mapping is based on name Easier to remap upstream components

123 Name-based metadata mapping
Demo Name-based metadata mapping

124 CDC in SSIS The old: CDC (Change Data Capture) was present in the DB engine, but required manual T-SQL coding to implement

125 CDC in SSIS The new: SSIS now has new task and components to handle CDC processing CDC Task – metadata (start/end initial load, etc.) CDC Source – retrieve CDC data CDC Splitter – break apart results

126 Environments Environment replace configurations
Collections of related values (ex: Production connection strings, Dev connection strings, etc.) Multiple environments can be associated with each project or package Specify for automated job, or easily choose at runtime

127 10+. Designer Improvements
Package annotations In prior versions, annotations were difficult SSIS 2012 improvements

128 Designer Improvements
Sort packages by name Sometimes it’s the little things that matter

129 Designer Improvements
Simplified data viewer

130 Designer Improvements
Universal status indicators

131 Designer Improvements
Variable management Scope default Expression management Static values vs. expression Expression indicator

132 Survival Tip #7: Have a bag of tricks

133 Have a bag of tricks Be lazy! Code once, reuse many
Create a portable system for reusing familiar patterns Database? Documentation?

134 Have a bag of tricks Be lazy! ETL Framework
Managed execution for multipackage ETL processes Restartability, consolidated error handling and logging

135 Have a bag of tricks Be lazy! Custom SSIS components
Create custom components for commonly used design patterns Parameterized script packages may substitute in SQL 2012

136 Have a bag of tricks Be lazy! Third party tools BIDS Helper
SSIS Reporting Pack SQL Sentry Plan Explorer Brent Ozar’s SQLBlitz

137 Have a bag of tricks Be lazy! Biml
Business Intelligence Markup Language Package generation tool Included with BIDS Helper (free)

138 Biml package generation
Demo Biml package generation

139 Questions? Comments? Standing ovation?

140 Thanks! TimMitchell.net @Tim_Mitchell tdmitch@gmail.com


Download ppt "A Survival Guide Tim Mitchell"

Similar presentations


Ads by Google