Presentation on theme: "SSIS Field Notes Darren Green Konesans Ltd. SSIS Field Notes After years of careful observation and recording of the Species SSIS, Genus ETL, in both."— Presentation transcript:
SSIS Field Notes After years of careful observation and recording of the Species SSIS, Genus ETL, in both natural and artificial environments, I’ve gathered a large collection of fieldnotes. Find out the little decisions that get made that can have big impacts on your project later on. What option to choose for better performance, or better management, and what are the trade-offs? This session won’t just focus on the holy grail of performance, it will review the way you build packages and use SSIS taking into account the on going maintenance and management aspects as well.
Common Princiapls Common design decisions or patterns Logging Frameworks Custom vs Stock
Basic Standards Solution and Project structure ETL vs ELT Staging Custom components
Source Control Products – Team Foundation Server – Subversion (Visual SVN) – Others… Issues – Cannot merge or use standard conflict resolution BIDS Helper- Smart Diff – TFS, SourceSafe,File BI Smart Diff – Subversion
Naming Conventions Prefix notation, e.g. DFT for Data Flow Task – http://consultingblogs.emc.com/jamiethomson/ http://consultingblogs.emc.com/jamiethomson/ Expand Name property – SQL Create Year Staging Table Expand Description property – Create the year named staging table. Any existing table will be dropped first. Documentation tools are not clever
Logging Performance monitoring – Real-time monitoring – Trending to justify upgrades Re-write problem packages Upgrade hardware / network / environment Problem solving – Why did the job fail last night?
Logging Options Built in SSIS logging – Good for standard stuff, including errors Maintenance routine to prune records Delete Info daily, Warnings weekly, Errors monthly Custom SSIS logging – Log process specific metrics - row counts – Event Handler or Control Flow Windows Logging – Event Log – Performance Monitor
Prune – Delete in Chunks SET @Count = 1000 WHILE @Count IS NOT NULL AND @Count > 0 BEGIN DELETE TOP (@Count) FROM dbo.sysdtslog90 WHERE StartTime < @MinStartTime SET @Count = @@ROWCOUNT -- Pause for 0.2 seconds WAITFOR DELAY '000:00:00.200' END
Frameworks Consistent logging approach Process or package state – Passing state between package processes – Saving state for the next run Last extract date Complete run only once per day Dynamic execution workflow – Managed in tables not Control Flow Easier to manage logical units and reuse
Frameworks Standard approach reduces costs – Lower support costs – Higher quality through reuse Cost of applying and maintaining framework – Maintainable frameworks are key! – Custom components encapsulate code – Use the API to bulk apply changes
Custom Components Task Pipeline Component – Source – Destination – Transformation Log Provider Connection Manager For Each Enumerators
Using Custom Components? Easy to manage and update – Good for re-use – Can add/edit functionality easily – One file per machine for all packages – Good for frameworks or complex operations – Good debugging and testing support Require.NET development skills External dependency – Additional step during the initial deployment, but also single update step thereafter
Stock Components & Scripting Faster to develop Familiar and easy to understand – Don’t write your own data flow engine in a script component! – Acknowledge the need for reuse when it exists and create a shared external assembly Self contained - No external dependency Copy and paste package maintenance – Not good for frameworks or common patterns
Recovery & Restarts Checkpoints – Native CheckpointFileName, SaveCheckpoints, FailPackageOnFailure – Task level restart only – Partition your Data Flow Raw files – Variable values are persisted – Configurations not refreshed – Event handlers within checkpoint scope
Recovery & Restarts Auto-Recovery – Roll your own – Check with IF EXISTS… – Framework workflow Table of packages with status Package or task level restart – Variables and precedence constrains expressions – Delete and re-load No change tracking or updates