Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advanced ETL: Embedding Integration Services Ashvini Sharma Development Lead DAT411 Microsoft Corporation Sergei Ivanov Technical Lead DAT411 Microsoft.

Similar presentations


Presentation on theme: "Advanced ETL: Embedding Integration Services Ashvini Sharma Development Lead DAT411 Microsoft Corporation Sergei Ivanov Technical Lead DAT411 Microsoft."— Presentation transcript:

1 Advanced ETL: Embedding Integration Services Ashvini Sharma Development Lead DAT411 Microsoft Corporation Sergei Ivanov Technical Lead DAT411 Microsoft Corporation

2 2 Prerequisites Knowledge of Integration Services Knowledge of Data Flow Functionality Level 400. Really.

3 3 Objectives Introduction to SSIS programming model Learn how to integrate with dynamic metadata Learn how to utilize data cleansing functionality in your apps

4 4 Integration Services

5 5 SSIS Terminology PackageTasks Precedence Constraints Connection Managers Containers Data Flow Task Components – Source Adapters, Transformations, Destination Adapters Paths

6 6 Application Overview Get data from an Excel file Provide fuzzy cleansing for certain text fields FirstName, LastName Save cleaned data in another Excel file Look at finished application first, then go through several iterations to build it

7 7 Application

8 8 SSIS is embeddable SQL Server uses SSIS SMO Maintenance Plans Other (non SQL) products in development are using SSIS Writing your own UI is possible SSIS designer, Management Studio, Import/Export Wizard, Migration Wizard Uses no secret APIs Enumerating/adding/removing/changing/listening/scheduling/… Considering releasing Migration Wizard in Shared Source Digital signing enables tamper resistance Several customers doing metadata driven package development

9 9 Pipeline Metadata Pipeline engine requires static metadata Early design decision Buffers laid out during pre execute Strict data types Cannot map columns during execution Designer debugging expects design time metadata at execution time Configured (dynamic) queries must resolve to design time metadata at runtime

10 10 Dynamic Metadata Scenarios Source schema changes/not known until execution Metadata driven ETL processes Handling dynamic metadata Generate data flows dynamically

11 11 Creating Packages

12 12 Creating Packages From scratch through object model Create all package elements from scratch Fast, small, efficient Harder to evolve the application From template package Adjust only what needs adjusting after loading the template package Need to embed potentially large template file Easier to evolve the application Digital signing detects user changes

13 13 Components Terminology ComponentInput Input Columns (Only data referenced by component) Virtual Input Columns (All available data produced by upstream components – used at design time for selecting input columns) External Metadata Columns (Schema snapshot) Output Output Columns (Produced data) External Metadata Columns (Schema snapshot) LineageID uniquely identifies a column Every output column gets a new Lineage ID Column Mapping Sources: ExternalColumn OutputColumn Transforms: InputColumn OutputColumn Destinations: InputColumn ExternalColumn

14 14 Pipeline Programming Model ComponentMetadata Provided for all components by the engine automatically Manages metadata and persistence for the component Contact information for unregistered components Helps delay creation of components until necessary Runtime Connection Collection Connection managers used by the component ComponentMetaData Inputs Outputs Component RCC

15 15 Configuring Data Flows

16 16 Using Fuzzy transforms

17 17 SSIS As A Source ETL processes typically encode complex business rules Reuse is important One version of the truth Updates in one place Leverage advantages of SSIS: scalability, manageability, visual building of complex processes, etc.

18 18 SSIS Source Implementation Implements IDbConnection ConnectionString is the command line args to dtexec.exe Command CommandText is the name of the DataReaderDest component in package ExecuteReader runs the package when asked for data, returns IDataReader Supports SchemaOnly also DataReaderDest implements IDataReader Gets the first buffer and waits for data request Microsoft.SqlServer.Dts.DtsClient Data Reader Destination Component

19 19 Putting it together

20 20 Summary Programming SSIS is straightforward Several embedding options exist SSIS can handle flexible metadata SSIS provides rich functionality and high performance

21 21 Resources Embedding Reporting and Analysis in your Smart Client App DAT313 – 502AB 5:00PM Samples installed by setup Community site, run by MVPs http://www.sqlis.com Interact with product team on MSDN Forums http://forums.microsoft.com/msdn/ShowForum.aspx?F orumID=80 Webcasts, training, blog links, books, … http://msdn.microsoft.com/SQL/sqlwarehouse/SSIS/de fault.aspx

22 © 2005 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.


Download ppt "Advanced ETL: Embedding Integration Services Ashvini Sharma Development Lead DAT411 Microsoft Corporation Sergei Ivanov Technical Lead DAT411 Microsoft."

Similar presentations


Ads by Google