Download presentation
Presentation is loading. Please wait.
1
SSIS – A Beginning Framework
SQL Saturday #32 23 January 2010 Eric Wisdahl SSIS – A Beginning Framework
2
About Me BI Professional Working in the Insurance Industry.
Insurance Industry 6 + Years BI / Data Warehousing 4 + Years MCTS – Business Intelligence Development and Maintenance MCTS – Database Developer MSDN SQL Forums Moderator Blog “Destination:Change”: LinkedIn: He spends what little free time he has reading technical books, playing games, perusing the MSDN SQL Forums, or spending time with his wife, cat and dogs.
3
Overview Most developers would agree that every SSIS solution will have the same fundamental outline. A basic framework will expedite the process by handling the common tasks between the systems while allowing the developer to concentrate on the task at hand. This framework will consist of many items, including but not limited to setting up package configurations, logging, audit trails, error handling, naming standards, etc. This document will present an example framework which can be used as the basis for future SSIS Package development.
4
Demo – Starting With Demo
Run through a demo of the overall process to show what the framework looks like at a glance. Run one of the controllers from BIDS and then from DTEXECUI.
5
Meta Sub Systems The Meta Subsystem contains information relating to the auditing, data quality, data dictionaries, processing control tables, and configurations. Please see documentation for the Meta-Data subsystem for more information on the audit, data quality and dictionary tables at
6
Audit Tables The Packages table stores information relating to the package name and versions which are executing. The PackageExecutions table stores information relating to the package that is being run, the start and end dates, and whether or not the execution was successful. The TableProcessing table stores information relating to the statistics of the package execution. How many records where initially in the table, how many were inserted, how many updated, how many errors there were, and how many records were in the table after execution. The DimAudit table stores information to tie the PackageExecutions and TableProcessing tables together for those packages which might have more than one entry for the TableProcessing table.
7
Data Quality Tables The DataQualityScores table will hold every combination of the Three Screen Categories and Scores (Column, Structure and Business Rules). The Screens table holds the sql for the screen as well as what type of screen it is, what table the screen relates to, how severe of a failure it is, what type of action should be taken if there is failure, etc. The ErrorEvent fact table will hold the date and time that a record failed a screen, as well as the package execution, table dictionary key and record identifier.
8
Errors Tables The SSISErrorLogs table will hold the error events generated during the execution of the packages. This contains the package, audit trail, error code, description and datetime that the error occurred.
9
Data Dictionary Tables
The TableDictionary table stores information relating to the database, schema, and table from the database system tables, as well as user input information such as the description, grain, display name, business name, etc. The ColumnDictionary table stores information relating to the column name, data type, size, precision, scale, nullability, and default value from the database system tables, as well as user input information such as description, business name, display name, type of SCD dimension, example values, unknown member values, etc.
10
Data Dictionary Tables (Continued)
The LogicalDataMap table stores information relating to how the data was input in to the system. This includes the source system database, schema, table, field and data type as well as the etl rules and any relevant comments.
11
Control Tables The FrequencyTypes table holds information relating to types of date ranges. It is used by the ProcessingDates table below as an enumeration. The ProcessingDates Table is a control table which holds a pointer to a filter as well as a start and end date range for the particular job to process. It also holds a pointer to the Frequency type. The date range values in the processing dates table are updated via a stored procedure based on the frequency type. The DictionaryDatabaseList table is used to store a list of attributes relating to the databases which will be looped through when processing the data dictionary tables.
12
Demo – META Database Take a very quick look at some of the META database and how it tracks package execution history.
13
Configurations Table The META environment also houses the SSIS Configuration Table. It is used to house all of the SQL Server configurations that are used in the various SSIS packages. Please see SSIS Configurations, Expressions and Constraints on or BOL for an overview of SQL Server configurations.
14
Configurations In this version of an SSIS framework, we use an environment variable to hold the connection string for the META database. In this fashion we form an indirect configuration to the rest of the configurations to be performed. Once we have the connection to META we use the SQL Server Configuration table to populate the rest of the framework configurations as well as the remainder of the connection strings. When using configurations, always put the description for the variable or property with the configuration if possible, as this allows the next user to identify how the record(s) in the configuration table are being used.
16
Configurations – Framework-AuditParameters-ServerName
The ServerName configuration is used to allow the easy identification of what server the configurations are coming from as well as (presumably) what server the ssis job was running from. It is further used in communicating back with the operator during error or completion s.
17
Configurations- Framework-AuditQueryExpressions
The AuditQueryExpressions configurations are used to set the variable values which contain the sql command strings (via expressions) for the execute sql tasks within the pre and post processing sequence containers.
18
Configurations – Framework-EmailSettings
The Settings configurations will hold the values for the from and to addresses. It will also hold the expressions for the subject and body of the when a package generates and error as well as for when a package executes successfully. Note – There is an alternative configuration Controller- Settings, which houses the same information but with different values, that will be used in the control (master) packages.
19
Configurations – Framework-IndexScriptGeneration
The IndexScriptGeneration configuration is used to house the expressions for the Create and Delete Index Script queries.
20
Configurations – Framework-RootFolder
The RootFolder configuration is used to house the UNC path to the folder which will contain sub folders for your log files, raw files, packages, access databases, etc. NOTE – In the examples I am presenting I use the “C:\” named drive. This is bad practice. All paths within SSIS should be full UNC paths (\\servername.domainname\folder\subfolder\). However, I do not have shares set up on my personal laptop… This is an example of “Do as I say, not as I do!”
21
Configurations – SMTPConnectionManager-ConnectionString
The SMTPConnectionManager-ConnectionString is used to house the connection string to the local exchange server (or other mail service). Note – As I do not have access to an exchange server outside of work, my examples either have non-working components, or script tasks pointing to gmail’s outward facing SMTP server. This script task, or something similar, will need to be used if you have any situations where you need to pass along security credentials to an task, as the send mail task does not allow any security outside of windows security.
22
Configurations – Other
If you have connection strings to a set of databases outside of the meta database, it is often useful to include all of these connections within the framework as well, so that you do not have to continually recreate the connection managers or reset the configurations to these connection managers. Once the framework configurations are set up, it is important to realize that other configurations can and should be set for the individual packages as applicable. In the screen shot showing the package configuration organizer you can see an extra configuration – Dictionary-DynamicDatabaseConnectionString that is relevant only to a particular package or set of packages, but not to the framework as a whole. This is normal behavior.
23
Demo - Configurations Take a very brief look at the configurations table and environment variables. Change at least one variable, run package, close and reload package.
24
Logging SSIS contains an internal logging mechanism to expose run time events. This information can be sent to text files, an sql profiler file, the sysssislog table on an instance of SQL Server, the windows event log or to an xml file. For our purposes, we use the text file logging mechanism. This creates a csv file for each package, which is dynamically created with the package name and date. This file can be used to track down warnings and errors from the execution of the package, as well as determining the last activity from the package if the package has hung. We have chosen the text file as it is a basic method of tracking any errors which is not reliant on any other system being up in order to function. In this framework I have included all logging events except for the OnPipeline events and the diagnostic events, as these add a lot of records to the log without providing details that I feel are really needed.
25
Logging Menu Item
26
Logging Wizard
27
Logging Wizard 2
28
DEMO - Logging Files Very brief look at the logging file that can be used to track information from the package executions.
29
Framework Variables Variables are used for a host of activities throughout the framework. There are variables which are affected by both package configurations and expressions. There has been some effort to keep the variables in a semblance of organization by using the namespace property. To see the namespace property, open the variables window and select the “choose variable columns” button.
30
Framework Variables This will open up the choose variable columns window. Here you have the option to select from the scope, data type, value, namespace and raise event when variable value changes columns. Check the namespace column.
31
Framework Variables In the framework, we have created a collection of namespaces to hold related variables. The AuditParameter namespace currently houses information about the destination and source tables. It is necessary to fill out the variables in this namespace for every package in order to leave the proper audit trail. The AuditQuery namespace currently houses variables which use expressions to generate the sql query or command used in the pre-processing and post-processing sequence containers (as well as the stop process task).
32
Framework Variables The AuditVariable namespace is used to house the return values from the sql queries, insert / update / error / etc counts from the data flow, etc. Essentially any item used to track an audit item for the package will be stored in this namespace. The DateParameter namespace is used to house information relating to the processing dates record. The namespace contains the frequency type variable which will need to be filled in for any package which wishes to make use of the processing dates table. This variable can be passed down from the parent package to ensure that the package is executing for the proper period processing. The DateParameter namespace further contains the processing date key, start and end date ranges for this package (if a record is present in the processing dates table for the package).
33
Framework Variables The Files namespace contains variables used to house network paths and file names. It includes variables that are either set via package configurations or expressions. The Index namespace is used to house the queries that will generate the create and delete index scripts. The Key namespace will be used to house any returned surrogate key values. As of this writing this is only used for the audit trail, although it is certainly possible to house any returned key within the namespace.
34
Framework Variables The Query namespace is used to house any queries that are process related as opposed to relating to the audit or control procedures. An example is a query used to update the type 2 slowly changing dimension columns in a batch update (as opposed to a row by row approach within the data flow). The SSIS namespace is used to hold variables related to ing the operators and constructing the subject and body of s to be sent out. The User namespace is the default namespace for SSIS. It will contain any variables which are added to the package using the framework (Unless if you specify another namespace).
35
DEMO - Look at the variables
Take a look at some of the variables, their expressions, descriptions, etc.
36
SSIS and Indexes Indexes are known to have a great impact on performance when performing a large number of inserts or updates. As such, it is advisable to drop and recreate the indexes associated with any table that an SSIS package is processing. We handle the creation and deletion of the indexes through a pair of expressions, stored as package configurations. The generate create index script is used in a data flow task which writes these out to a flat file. The delete index scripts are done through a single execute sql task. The create index script file is only deleted if the rebuild index is successful allowing the operator to rebuild the indexes outside of the process if the package fails. NOTE: I am continually trying to find a better way of handling the indexes. Sometimes I will only drop and rebuild if the number of changed rows will be above a threshold (usually ~5%), others I drop and rebuild on every run. In other versions of the framework I have used recordsets instead of a flat file, as well as storing the definitions in a permanent table and iterating over the records.
37
Stop Process The Stop Process task in the framework is used to determine whether or not this process has been run for the parent package before. This task uses the AuditQuery::StopProcessQuery variable as the source of the query and the AuditVariable::StopProcess variable to store the Boolean value returned in the query. Finally, the precedence constraint going in to the pre-processing container is as follows: @[AuditVariable::StopProcess] == false == -1
38
Pre-Processing Container
The pre-processing sequence container houses the tasks used in determining the initial row counts and surrogate key for the destination table, creating the audit trail, generating the necessary control information for the package and those tasks used to handle the indexes on the destination table.
40
Post Processing Container
The post-processing sequence container houses the tasks used in determining the initial row counts and surrogate key for the destination table, updating the audit trail, recreating the indexes on the destination table, sending out completion s (where appropriate) and deleting any files which are no longer necessary.
42
Processing Container The processing container is used to house the tasks specific to the package being developed. It can be further broken down into sub containers if desired.
43
Data Flow Tasks Most of the activity in the processing sequence container will take place in a data flow task. Inside of the data flow task, we like to keep certain items standardized across packages.
44
Counts Extract – The number of rows pulled from the source system Error Type1 Update – The number of data errors encountered during the type 1 update branch. Error Type 2 Update – The number of data errors encountered during the type 2 update branch. Error Insert – The number of data errors encountered during the insertion of the records into the destination table. Failed Lookup – The number of rows that failed to find a match in a lookup transformation. Often used when building dimensions. Insert Standard – The number of rows inserted during standard processing. Insert Non-Standard – The number of rows inserted during non-standard processing (ex. late arriving) No Change – The number of rows which did not change between what was input from the source system and what is currently stored in the destination. Update Type 1 – The number of rows updated during the processing of the SCD Type 1 branch. Update Type 2 – The number of rows updated during the processing of the SCD Type 2 branch.
46
Error Files Data errors are put out to a raw file destination. All errors within the data flow should be brought together via a union all operation with enough information to describe where the error occurred as well as what the error was. NOTE: If you are using SQL 2005, the Raw File Reader is an excellent tool! Unfortunately, there has not been an update for SQL 2008
48
OnError Event Handler The OnError Event Handler is a set of code that is executed any time that an error has occurred while executing a package. These are errors that occur with the process, and are different from a data error, if the data error is handled within the data flow task. Within the OnError Event Handler we determine whether or not we have already sent an error for this package. If we have not previously sent an error , we do so now to a list of recipients determined via package configuration. Afterwards we increment the counter so that we do not send a second error .
49
OnError Event Handler Cont’d
I have also recently added an execute sql task to record the full error stack to the META.Errors.SSISErrorLogs table. This records the error code, message, the package with the error and the time of the error. I have found that this is useful for tracking the frequency of the errors and the dependability of the packages. It also makes it easy to quickly look through the error stack. I absolutely still use the Flat File logs, as they record the full picture of events leading up to the error. Furthermore, the error being recorded might be a lack of access to the META database.
50
On Error event handler
51
Sample email Sample Error Email: From: xxx
Sent: Tuesday, March 10, :05 AM To: Report Team Subject: Error during execution of the load_RPT_AGENTADMIN_JE_CODES package. Importance: High There was an error in the execution of the load_RPT_AGENTADMIN_JE_CODES package which started at 3/10/2009 2:04:55 AM. The following is the first error reported: SSIS Error Code DTS_E_CANNOTACQUIRECONNECTIONFROMCONNECTIONMANAGER. The AcquireConnection method call to the connection manager "EPASRPT" failed with error code 0xC There may be error messages posted before this with more information on why the AcquireConnection method call failed.
52
Connection Managers Connection Managers should be created for every data base which is used. The name should be the name of the database or file with no reference to the machine or account to be used (as these will change between environments). The connection managers that are common to the development efforts should be placed in the common template for a project and should have the connection string and descriptions set via package configuration. It is worth noting that having extra connection managers within a package that are not used carries a minimal cost when validating the package. If there would be two separate connection managers to the same database, but with different connection manager types, assume that the OLE db connection manager is the default and name any other connection managers with their type (example META and META.NET)
53
Hash Values (Check Sums)
Hash values are used to generate quick comparisons to determine whether or not a record, or a subset of a record’s columns, has changed. In order to facilitate the quick computation of hash values within a data flow we have employed the Checksum Transformation available from Konesans. With this transformation you simply select which columns you would like to be included with the hash and specify and output column name.
55
BIDS Helper BIDS Helper is a visual studio add-in that expands the functionality of the business intelligence design studio. BIDS Helper includes a vast array of extensions including giving a graphical representation of expressions and configurations, allowing for pipeline component performance breakdowns, it extends the variables window, sorts the project files, fixes relative paths, gives a list of all expressions and non-standard property values used within the packages, etc. It is HIGHLY recommended that anyone using BIDS to develop SSIS package install this product. BIDS Helper is available at For more information on this product please see the the bidshelper web site listed above.
57
Object GUIDs Objects within SSIS have global unique identifiers (GUID) which are used to reference the individual object within the SSIS engine. The package GUID is further recorded in the audit trail. As such, it is customary to ensure that these values are unique across packages. As most packages are created as a copy of some previous file, the GUIDS have to be reset. You can reset the package GUID manually in the package properties window by selecting the ID drop down and selecting generate new id.
59
BIDS Object GUIDs However, if you have installed BIDS you can generate new GUIDS for all objects within the package by right clicking on the package name within the solution explorer and choosing Reset GUIDS (this method is preferred as it will reset all of the IDs within the package).
60
Package Versions There is a version number associated with each of the SSIS Packages. For the version, there are three portions, The Major Version, Minor Version and Build. The build is an auto increment number that grows each time that the file is saved. It can, however, be reset manually. The Major and Minor Version numbers are always set manually. We try to set the minor version every time that there is a bug fix or small enhancement. If there is a major enhancement we increment the major version number. These version numbers are important to keep up with, as the life of the package can be tracked via these milestones to determine whether or not the package is continuing to perform well as time goes on, or if there was an alteration to the package that might have increased or decreased performance.
61
Package Versions There is further a property for Version Comments that should be filled in to explain the changes that have been implemented.
62
Conclusion I hope that this has been helpful. The framework I have presented is a draft item. I am continually updating it, and, if you should happen to use it as your base framework going forward, I would expect you to do the same. Please, feel free to contact me: –
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.