SSIS Custom Components

Slides:



Advertisements
Similar presentations
Tridion 5.3 Templates.
Advertisements

Data Quality Services + Whats new in SSIS in SQL Server 2012 James Beresford
Module 1: Introduction to SQL Server Reporting Services.
Introduction to .NET Framework
Moving Data Lesson 23. Skills Matrix Moving Data When populating tables by inserting data, you will discover that data can come from various sources.
WTX Overview.
Mark Rees Microsoft Consulting Services OFC409 Windows Workflow Foundation (WF) Primer Creating WF programs in Visual Studio Creating workflow templates.
Platinum Gold Silver Group BY: [Food and Drink at Reading Bowl, see you there!] Feedback Forms: [Voucher for £30 book on return of Form]
Microsoft SharePoint 2013 SharePoint 2013 as a Developer Platform
Using Microsoft SharePoint to Develop Workflow and Business Process Automation Ted Perrotte National Practice Manager, Quilogy, Microsoft Office SharePoint.
SQL Server 2005 Integration Services Mike Taulty Developer & Platform Group Microsoft Ltd
Enterprise Reporting with Reporting Services SQL Server 2005 Donald Farmer Group Program Manager Microsoft Corporation.
Windows.Net Programming Series Preview. Course Schedule CourseDate Microsoft.Net Fundamentals 01/13/2014 Microsoft Windows/Web Fundamentals 01/20/2014.
SQL Server Integration Services (SSIS) Presented by Tarek Ghazali IT Technical Specialist Microsoft SQL Server (MVP) Microsoft Certified Technology Specialist.
What’s New in SSIS with SQL 2008 Bret Stateham Training Manager Vortex Learning Solutions blogs.netconnex.com.
Understanding Code Compilation and Deployment Lesson 4.
Lesley Bross, August 29, 2010 ArcGIS 10 add-in glossary.
Virtual techdays INDIA │ Nov 2010 Developing Office Biz Application using WPF on Windows 7 Sarang Datye │ Sr. Consultant, Microsoft Sridhar Poduri.
Scalable Game Development William Roberts Senior Game Engineer
Microsoft SharePoint Server 2010 for the Microsoft ASP.NET Developer Yaroslav Pentsarskyy
MySQL Connection using ADO.Net Connecting to MySQL from.NET Languages.
DTS Conversion to SSIS Conversion Best Practices Mike Davis
IT 456 Seminar 5 Dr Jeffrey A Robinson. Overview of Course Week 1 – Introduction Week 2 – Installation of SQL and management Tools Week 3 - Creating and.
Oracle Data Integrator Procedures, Advanced Workflows.
CIS 451: ASP.NET Concepts Dr. Ralph D. Westfall January, 2009.
1 Chapter Overview Preparing to Upgrade Performing a Version Upgrade from Microsoft SQL Server 7.0 Performing an Online Database Upgrade from SQL Server.
Embarquez les services d'intégration SQL Server 2005 Romelard Fabrice D311.
Advanced ETL: Embedding Integration Services Ashvini Sharma Development Lead DAT411 Microsoft Corporation Sergei Ivanov Technical Lead DAT411 Microsoft.
DAT 360: DTS in SQL Server 2000 Best Practices Euan Garden Group Manager, SQL Server Microsoft Corporation.
Integration Services in SQL Server 2008 Allan Mitchell SQL Server MVP.
DAT 332 SQL Server 2000 Data Transformation Services (DTS) Best Practices Euan Garden Product Unit Manager SQL Server Development Microsoft Corporation.
1 Integration Services in SQL Server 2008 Allan Mitchell – SQLBits – Oct 2007.
Separating the Interface from the Engine: Creating Custom Add-in Tasks for SAS Enterprise Guide ® Peter Eberhardt Fernwood Consulting Group Inc.
Taking Control of Visual Studio through Extensions and Extensibility Anthony Cangialosi Senior Program Manager Lead Microsoft Corporation DEV311.
Presented by Syed Baber Development Lead Mazik Global.
Please note that the session topic has changed
Building Custom Controls with ASP.NET and the Microsoft ®.NET Framework Rames Gantanant Microsoft Regional Director, Thailand
Creating Simple and Parallel Data Loads With DTS.
© Geodise Project, University of Southampton, Workflow Support for Advanced Grid-Enabled Computing Fenglian Xu *, M.
Integrating and Extending Workflow 8 AA301 Carl Sykes Ed Heaney.
Scripting Just Enough SSIS to be Dangerous. 6/13/2015 Visit the Sponsor tables to enter their end of day raffles. Turn in your completed Event Evaluation.
Jemini Joseph. About me Working in Microsoft BI field since Mostly consulting in SSIS Worked as programmer in Visual Basic before moving to BI
Copyright 2015 Varigence, Inc. Unit and Integration Testing in SSIS A New Approach Scott @varigence.
Pulling Data into the Model. Agenda Overview BI Development Studio Integration Services Solutions Integration Services Packages DTS to SSIS.
Presented By: Jemini Joseph | June 18, Woodland Center Blvd. Tampa, Florida | Phone: | SSIS Factory.
Template Package  Presented by G.Nagaraju.  What is Template Package?  Why we use Template Package?  Where we use Template Package?  How we create.
Others Talk, We Listen. Managing Database Projects in Visual Studio 2013.
SQL Database Management
What Is The SSIS Catalog and Why Do I Care?
Beyond the BDC\BCS Model
SSIS Custom Pipeline Component A step-by-step guide
Visual Studio Tools for Office 2005
Accessing and Surfacing LOB Data in SharePoint 2010
CE-105 Spring 2007 Engr. Faisal ur Rehman
Haritha Dasari Josue Balandrano Coronel -
Module 0: Introduction Chapter 2: Getting Started
Module 1: Getting Started
Populating a Data Warehouse
Populating a Data Warehouse
Populating a Data Warehouse
Matt Masson Software Development Engineer Microsoft Corporation
MySQL Connection using ADO.Net
Populating a Data Warehouse
Orchestration and data movement with Azure Data Factory v2
Populating a Data Warehouse
JavaServer Faces: The Fundamentals
SSIS Data Integration Data Warehouse Acceleration
SSIS Data Integration Data Warehouse Acceleration
SSIS Data Integration Data Warehouse Acceleration
Just Enough SSIS Scripting to be Dangerous.
Presentation transcript:

SSIS Custom Components I took a number of false starts with custom components, but after a bug fix to a script used in multiple packages ( and multiple times in each) enough was enough. We are not going to dive to deep into the process, but I want you to go away with enough confidence , and a bit of code, to try for yourselves. If you are uncomfortable with .NET development , that’s fine , being a data professional im not either. Going to concentrate on the Flow of a component and the SSIS interfaces not .NET development per se. So at work you may need to buddy up with a developer, Dave Ballantyne dave.ballantyne@live.co.uk @davebally

Why ? Provide new functionality not provided as standard Why script in the first place ?, although very rich in functionality , it would me impossible for Msoft to deliver 100% of the functionality that we all need as standard. I will need different tasks to you… A simple example, for something that is not provided as standard and would be a good candidate for scripting. The aggregate transformation

Why ? Reusability Performance Well documented Component is a DLL Single code base Can be used multiple times in a single project Can be shared across multiple projects Easy to test Component version Performance Faster than scripting Well documented Though not a how-to guide Why would you choose a component over scripting ? Reusability is the BIG win Performance : Generally faster , I havent made a component run slower that the equiv. script Documentation is good on msdn , though not a howto guide. Documentation tells you in isolation what each method /member does and its expected inputs but not a guide guide of the whole process.

Types Of Component Data Connections Log Providers For Each Loops Control Flow Tasks Data Flow Pipeline Component Custom User Interface

Pipeline Component Types Sources Transforms Destinations

Design/Run Time Design Time Run Time Work done in BIDS attachments / detachments Validation Column usage Run Time Metadata interrogation DTEXEC Flow of data

Demo 1 Reuse and Performance 1-Scripting. String min max , the standard aggregate on support numerics and date/times. 1-MinmaxScr - Show input data in grid and output in grid and step thro code.

Performance Comparison 100,000 500,000 1,000,000 5,000,000 10,000,000 20,000,000 31,999,680 Custom 261 900 1,667 7,867 16,375 32,852 51,426 Script 684 2,069 3,949 19,214 38,690 76,755 123,243 1 mil , 5 mill , 10 , 20 31.8 mill rows , not much difference in the relative speed of execution Thru DTExec average ms of 20 runs each. Startup costs ? Custom very slightly quicker (75 vs 89 ms for 10 row input) TBH, probably doing something wrong if you care! Im not making the outright statement that a custom component is faster , slightly faster I have seen in this case a lot faster , in others same cost. I very much doubt you could create a component that runs slower though, obviously assuming that the guts of it is performant.

Requirements Visual Studio – BIDS is not enough Or Visual Basic / C# Express Client Tools SDK Express is fine but does not support post-build events which can be a pain for us. Debugging will also be a pain.

Starting Out Target Framework 3.5 (Advanced compile options) Sign the assembley Add References(Program file(x86)/<SqlServer>/100/sdk/Assemblies) Microsoft.SqlServer.DTSPipelineWrap Microsoft.SqlServer.DTSRuntimeWrap Microsoft.SqlServer.ManagedDTS Microsoft.SqlServer.PipeLine Host

Class Creation Inherits PipelineComponent Uses attribute DtsPipelineComponent

Post Build Copy DLL to “C:\Program Files (x86)\Microsoft SQL Server\100\DTS\PipelineComponents” Register to Global Assembley Cache using GACUTIL Must Restart BIDS For first use “Choose Items”,”SSIS Data Flow Components” ,tick Component Use a post build event to call a batch file to copy the dll file and then install in the gac. Means that visual studio must run in administrative mode to do this.

MetaData IDTSComponentMetaData100 PipelineComponent.ComponentMetaData Describes the Component to the engine Inputs, Outputs Custom data held within IDTSCustomProperty100 Most level s of object 100 = 10.0 = 2008 90 = 9.0 = 2005 BOL Still list denali as 100 , changes?? Metadata = Data about data, in the case the inputs and outputs. The number of them and the definitions of their columns Virtual

MetaData Inputs – IDTSInput100 Exposed via InputCollection member in MetaData One instance for each attached input Contains virtual column collection Accessed with GetVirtualInput() member View of the IDTSOutput100 of the Upstream component IDTSVirtualInputColumn100 Input Column Collection Accessed with InputColumnCollection Those that are used in the component IDTSInputColumn100 SetUsageType used to add the virtual column to the input column 100 = 10.0 = 2008 90 = 9.0 = 2005 Metadata = Data about data, in the case the inputs and outputs. The number of them and the definitions of their columns

MetaData Outputs – IDTSOutput100 Dispositions – Errors Exposed via OutputCollection member in MetaData One class for each output output Column Collection Accessed with OutputColumnCollection IDTSOutputColumn100 Dispositions – Errors Set IsErrorOut on IDTSOutput100

Icons Size 16*16 For ToolBox 32*32 For Design Surface Order of “IconResources” is important Build action must be “Embedded Resource” There are default icons , so not essential. But does make the data flow easier to understand if a meaningful icon can be used 256 Colours works

Errors and warnings FireError At design or run time

Errors and warnings FireWarning

Design Time Methods Methods ProvideComponentProperties Define initial metadata of component Validate Tests the metadata is correct ReinitializeMetaData Fix the metadata With design time methods we are generally responding to user actions within the Bids environment. ProvideComponentProperties , initial metadata of the component. Validate – Check that the metadata is clean and the code that will execute at run time will operate as expected. VS_ISVALID – All is ok , the component can and will run with this meta data VS_ISBROKEN – Metadata is ‘wrong’ the user need to do some work in BIDS to resolve. 2 Severe errors VS_NEEDNEEDMETADATA – Contains Errors that can be fixed in ReinitializeMetaData VS_CORRUPT – Calls ProvideComponentsProperties, Start Over….

Debug Can use MessageBox’s , which is simple though a pain. To debug design time functionality run BIDS as the “Start external program” Run time , run dtexec with a package that is setup to execute you component, don’t run bids. It doesent capture the runtime events properly. Bug in VS2010 , step over is treated as run to completion with 3.5 framework components

Demo 2 Build a simple component

Run-Time Processing Pre-Execute PrimeOutput ProcessInput PostExecute Within your class that has overridden PipelineComponent PreExecute – Interrogate Metadata PrimeOutput – passes in references to the output buffers , in this case only one which is copied to outputbuffer ProcessInput – loop while NextRow() returns true , If the last buffer has been sent then buffer.EndOfRowset set True You must call SetEndOfRowSet on each buffer else wont complete , will stay ‘yellow’ forever. PostExecute – HouseKeeping , tidying up , closing connections etc.

PreExecute PrimeOutput Setup the runtime objects Interrogate the Metadata and buffer manager Find the colindex(s) in buffers based on metadata BufferManager.FindColumnByLineageID(InputId,InputCol.LineageId) PrimeOutput Pre – execute – Interrogate the MetaData to build any required runtime objects , also find the BufferColumnID’s based upon the LineageId’s of the inputcolumn metadata using FindColumnByLineageID in the BufferManager component. Prime output , capture reference to the outputbuffer(s) , of type PipelineBuffer, for reference later.

Process Input Loop on buffer.NextRow If buffer.EndOfRowset is true set outputBuffer.SetEndOfRowset() MetaData functions are not optimized for performance. Avoid referencing meta data within the process input function , they are not optimized for performance. Do all the interogation within the preexecute function. Very important to set SetEndOfRowset when done , otherwise the task will not finish. Downstream will still be expecting rows.

PipelineBuffer Used for both input and output buffer Get<DataType> and Set <DataType> SetString / GetString SetInt32 / GetInt32 AddRow Insert and move to new row SetEndOfRowset After final row has been poplulated Pipeline buffer class methods

Sync Or Async ? Sync Async Add columns to existing data flow SynchronousInputID of output = ID of input Async Create new data flow buffer SynchronousInputID =0 Taking a really simplistic view…. Async ‘create’ a new dataset , sync adds columns to an existing one. SynchronousInputID ‘ties’ the input and output together if synchronous We still write to the outputbuffer as an async component , SSIS handles the tying of the output and input together to allow for the one-2-one nature.

Demo 3 RunTime execution

User Interface Why do you need a UI, you don’t. Consider the code we have been talking about, we have hardcoded the columns we need. Our Key column is called NUM the column that we want the min MAX of is called RNUM It works but very limiting….

User Interface Going back to the aggregation component we pick our columns and the operation on each of them

User Interface A Class that implements IDtsComponentUI Registered to the component class with UITypeName PublicKeyToken is found with GACUTIL We need a class that implements the IDTSComponentUI interface. This class is then registered into the main component by using the UITypeName Property GACTUIL –L to find public key or using explorer in \windows\assembly

User Interface If defining within a separate DLL from the component class , then needs signing and installing in GAC as the main class Within the class that implements IDTSComponentUI, 2 Most important functions Intialize , store the metadata to a local var . Pass that to the dialog to update on OK. Edit , actually do the editing, Must return true if the metadata has changed , else false. Typically , and simply , fire a standard win form that will modify accept the input and upon OK update it

User Interface Demo 4 User interface UI Code Step Through

Conclusion Like SSIS , large learning curve Reusability Potentially Faster ? .Net skills are required Like SSIS , we have a steep learning curve. Its not a simple ABC process , a whole solution is required to even start to run test and play with the functionality. Updating the metadata is all very well , but if you arent processing that at runtime can you be sure you are doing it right ? Reusability – Unlike scripting which has to be debugged fixed updated cut pasted copied many time repeatedly , we have a single point of code. One update will fix all instances on that machine Potentially Faster – as demonstrated , in this case it is faster by a significant margin , but that cant be guaranteed. .Net Skills – Data professionalls may find it tricky in the early stages, but one we understand the concepts and flow of the engine then will become easier. Hopefully you can find a friendly developer to help you out.

SSIS Custom Components Any questions ? Dave Ballantyne dave.ballantyne@live.co.uk @davebally