Presentation is loading. Please wait.

Presentation is loading. Please wait.

2 SQL Server 2008 ETL drilldown Shane Bartle Principal Consultant BIN 309 Pat Martin ANZ SQL Premier Field Engineer Microsoft New Zealand.

Similar presentations


Presentation on theme: "2 SQL Server 2008 ETL drilldown Shane Bartle Principal Consultant BIN 309 Pat Martin ANZ SQL Premier Field Engineer Microsoft New Zealand."— Presentation transcript:

1

2 2 SQL Server 2008 ETL drilldown Shane Bartle Principal Consultant BIN 309 Pat Martin ANZ SQL Premier Field Engineer Microsoft New Zealand

3 3 What We Will Cover Background to SSIS Source Data Extraction – New Approaches Monitoring Enhancements Developer Additions Data Profiling

4 4 END USER TOOLS & PERFORMANCE MANAGEMENT APPS Excel PerformancePoint Server BI PLATFORM SQL Server Reporting Services SQL Server Analysis Services SQL Server DBMS SQL Server Integration Services SharePoint Server DELIVERY ReportsDashboardsExcelWorkbooksAnalyticViewsScorecardsPlans Integrated End-To-End BI Offering

5 5 Integration today Increasing data volumes Increasingly diverse sources Requirements reached the tipping point Low-impact source extraction Efficient transformation Bulk loading techniques SQL Server 2008 SSIS Background to SSIS

6 6 GeoSpatial Data: Semi structured Legacy data: binary files Application database Integration is a seamless, manageable operation Source, prepare, & load data in single, auditable process Scale to handle heavy and complex data requirements SQL Server Integration Services GeoSpatial Components Custom source Standard sources Data-cleansing components Merges Data mining components Warehouse Reports Mobile data Integration Services In Action Cube

7 7 Current SSIS Thread Scheduler Threads affinitised to dataflow subtrees Thread starvation on highly-parallel designs Single thread for each synchronous path Non-linear scale-up (plateau) SSIS Pipeline Parallelism Rewrote the thread scheduler Improved performance and scale Thread pool shared across multiple components Benefits Better performance (50%) in highly-parallel designs Less manual tuning during development (lower TCO) Better hardware utilisation (higher ROI) It just works! Performance Improvements

8 8 Extracting data from the source is expensive Efficient extraction is key to improving ETL performance Involves bulk loading data into staging areas or warehouse Time consuming and resource intensive Triggers (synchronous IO penalty) Timestamp columns (Schema changes) Complex queries (delayed IO penalty) Custom (ISV, mirror, snapshot, …) Incremental data load is key to efficient extraction Need to know what changed at source since a point in time Expensive lookups to determine changed columns Providing information up front about which columns changed SQL Server 2008 SSIS Source Data Extraction – New Approaches

9 9 Change Data Capture Information about what changed at the source Operation (Insert, Update, Delete) Update mask (which columns changed) Changes captured from the log asynchronously Minimal impact on source system Log reader can be scheduled to run during idle time Enabled per table Hidden change tables store captured changes One change table per source table that is tracked Retention-based cleanup jobs CDC APIs provide access to change data Table valued functions and scalar functions provide access to change data and CDC metadata TVF allows the changes to be gathered for specific intervals enabling incremental population of DW Change Tables OLTP Data Warehouse

10 10 Change Data Capture Pat Martin ANZ SQL Premier Field Engineer Microsoft N.Z.

11 11 Merge Statement Single statement can deal with Inserts, Updates, and Deletes Microsoft extension to ANSI definition for DELETE semantics Performance goals: 20% faster Minimal logging on inserts (2×) Typical solution: Clean the source data, load it into Tbl_Staging Index Tbl_Staging UPDATE Warehouse INNER JOIN Tbl_Staging ON… INSERT Warehouse LEFT JOIN Tbl_Staging ON… MERGE Warehouse FROM Tbl_Staging ON…

12 12 Merge Example MERGE dbo.Departments AS d USING dbo.Departments_delta AS dd ON (d.DeptID = dd.DeptID) WHEN MATCHED AND d.Manager <> dd.Manager OR d.DeptName <> dd.DeptName THEN UPDATE SET d.Manager = dd.Manager, d.DeptName = dd.DeptName WHEN NOT MATCHED THEN INSERT (DeptID, DeptName, Manager) VALUES (dd.DeptID, dd.DeptName, dd.Manager) WHEN NOT MATCHED BY SOURCE THEN DELETE

13 13 Merge Statement Pat Martin ANZ SQL Premier Field Engineer Microsoft N.Z.

14 14 Lookup Transform Enhancements Scalable Cache Implementation Cache-load is a separate operation to Lookup Can be hydrated and dehydrated securely to the file system Caches can be explicitly shared Adaptable Caches Can be loaded from any source (SQL, Text, Mainframe…) Track cache hits and misses Multiple Modes No Cache (pass-through to DB) JIT – Just In Time (on miss, query database, and store result) Full-Cache (pre-load all rows)

15 15 Lookup Transform 2008 Shane Bartle Principal Consultant Microsoft N.Z.

16 16 Logging events to watch pipeline internals PipelineExecutionPlan, PipelineExecutionTree, BufferSizeTuning System Monitor to track I/O issues Buffers In Use tracks how many buffers are presently being used Buffers Spooled tracks how many 10Mb buffers have been spooled to disk Superdump to resolve error scenarios New support capability for reactive fix-up SQL Server 2008 SSIS Monitoring Enhancements

17 17 Superdump Provides visibility into the activity of a running package Can be triggered without stopping a package Can be scheduled (via registry key) to run on a Crash Specific Error Condition

18 18 Superdump Pat Martin Premier Field Engineer Microsoft N.Z.

19 19 C# Support ADO.Net Support Improved Import Export Wizard SQL Server 2008 SSIS Developer Additions

20 20 2005 used VSA for Script Design and Execution Legacy component with Visual Basic only Limited set of “reference-able” assemblies In 2008 SSIS uses Visual Studio Tools for Applications Visual Studio Designer Shell C# (or VB.NET) as a language Can reference all.net assemblies Can reference Web Services C# Support

21 21 ADO.Net Support SSIS 2005 had a ADO.NET ‘DataReader’ Limited to supplying a SqlCommand SSIS 2008 has a full ADO.NET Data Source Much Enhanced User Interface ODBC Support

22 22 ADO.Net Support Data Type Conversions New page shows mappings and possible issues May insert data convert transforms into dataflow Default mappings are customisable (via Notepad) New System for Scaling Up Number of Tables Make as sequence of dataflow tasks Each with 5 source/transform/destination chains Import Export Wizard Additions

23 23 Import/Export Wizard Shane Bartle Principal Consultant Microsoft N.Z.

24 24 Creates a Profile of Your SQL Tables Explore or maintain data quality Run as a task in SSIS Produces XML file output Has a nice visual tool for working with profiles Analyze a Set of Columns / Tables Looking For Candidate keys Column length distribution Null Ratio Pattern detection Value distributions and stats Functional dependencies Value inclusion SQL Server 2008 SSIS Data Profiling

25 25 Data Profiling Pat Martin ANZ SQL Premier Field Engineer Microsoft N.Z.

26 26 Session Summary Performance Improvements for throughput New delta extraction alternative approaches Monitoring and support features C# and ADO.Net extend coverage Data Profiling for data quality considerations

27 Resources Technical Communities, Webcasts, Blogs, Chats, and User Groups http://www.microsoft.com/communities/default.mspx Microsoft Learning and Certification http://www.microsoft.com/learning/default.mspx Microsoft Developer Network (MSDN) and TechNet http://microsoft.com/msdn http://microsoft.com/technet http://microsoft.com/msdn http://microsoft.com/technet Trial Software and Virtual Labs http://www.microsoft.com/technet/downloads/trials/default.mspx

28 28

29 29 Resources www.microsoft.com/teched Tech·TalksTech·Ed Bloggers Live SimulcastsVirtual Labs http://microsoft.com/technet Evaluation licenses, pre-released products, and MORE! http://microsoft.com/msdn Developer’s Kit, Licenses, and MORE!

30 Related Content DAT361 SQL Server 2008 Security Deep Dive BIN309 SQL Server 2008 ETL drill down BIN310 SQL Server 2008 Analysis Server (SSAS) enhancements DAT362 SQL Server Spatial in the Spotlight BIN352 Microsoft SQL Server 2008 Reporting Services: Architecture Overview BIN311 Advanced Dashboard Creation with MOSS 2007 DAT364 End-to-End Troubleshooting for Microsoft SQL Server 2005/2008 BIN401 Optimising Query Performance in SQL Server 2008 Analysis Services DAT355 Upgrading to Microsoft SQL Server 2008: Notes from Early Adopters BIN402 Building and Deploying Advanced MOSS 2007 Planning Applications

31

32 32 Please complete an evaluation

33 33 © 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.


Download ppt "2 SQL Server 2008 ETL drilldown Shane Bartle Principal Consultant BIN 309 Pat Martin ANZ SQL Premier Field Engineer Microsoft New Zealand."

Similar presentations


Ads by Google