Presentation is loading. Please wait.

Presentation is loading. Please wait.

BRK2279 Real-World Data Movement and Orchestration Patterns using Azure Data Factory Jason Horner, Attunix Cathrine Wilhelmsen, Inmeta -

Similar presentations


Presentation on theme: "BRK2279 Real-World Data Movement and Orchestration Patterns using Azure Data Factory Jason Horner, Attunix Cathrine Wilhelmsen, Inmeta -"— Presentation transcript:

1

2 BRK2279 Real-World Data Movement and Orchestration Patterns using Azure Data Factory Jason Horner, Attunix Cathrine Wilhelmsen, Inmeta

3 Agenda Overview Design Patterns Preview of…?

4 Overview of Azure Data Factory

5 Azure Data Factory Sources Data Warehouse Analysis Reporting ETL / ELT

6 ETL / ELT Azure Data Factory Visual UI Drag and Drop Code Support
Python, .NET, ARM Control Flow Loop, Branch, If SSIS Execution Lift and Shift

7 ETL / ELT

8 ETL - Extract Transform Load

9 ETL - Extract Transform Load

10 ETL - Extract Transform Load

11 ETL - Extract Transform Load

12 ETL - Extract Transform Load

13 ETL - Extract Transform Load

14 ELT - Extract Load Transform

15 ELT - Extract Load Transform

16 ELT - Extract Load Transform

17 ELT - Extract Load Transform

18 ELT - Extract Load Transform

19 ELT - Extract Load Transform

20 ETL ELT

21 Azure Data Factory Concepts

22 Azure Data Factory Concepts
Pipelines Activities Triggers Linked Services Datasets Integration Runtime

23 Azure Data Factory Design Patterns

24 What are Design Patterns?
Reusable solutions for common problems: Description or template Formalized best practices Not finished designs that can be transformed directly into source or machine code

25 Why use Design Patterns?
Use tested, proven and documented solutions to: Speed up development Prevent issues than can cause problems later Improve code readability

26 Design Patterns Truncate and Load Merge Load Incremental Load Bulk Table Transfer

27 Full Extract: Truncate and Load
Specific use cases: All data needed, but replication is not available Small data sets that change often No historical requirements Very simple, but can be considered an antipattern

28 Full Extract: Truncate and Load
Source Sink Source Table Sink Table

29 Full Extract: Merge Load
Specific use cases: All data needed, but replication is not available Medium data sets that have few changes Need to minimize churn on the staging tables Adds complexity, doesn’t solve the incremental extract from source

30 Full Extract: Merge Load
Source Sink Source Table Table Type Stored Procedure Sink Table

31 Incremental Load Specific use cases:
All data needed, including a robust history Large data sets that have many changes Need to minimize churn on the staging tables and load on source systems Often requires changes to the source system (triggers, added columns, or engine features)

32 Control Table (High Watermark)
Incremental Load Source Sink Source Table Table Type Stored Procedure Change Table Change Tracking Current Version Control Table (High Watermark) Sink Table

33 Delta Detection Hash Comparison (Full Extract) High Watermark (Incremental Load) Change Tracking (Incremental Load) Other: column-by-column comparison, triggers, row versioning, modified dates, temporal tables

34 Delta Detection: High Watermark
BE WARY of these approaches! Delta Detection: High Watermark Based on ascending integer or datetime Store the highest value in a control table or calculate by SELECT MAX(<Column>) FROM Table Based on ascending date Update or Create Assumes data is not updated and that the dates are maintained automatically

35 Delta Detection: Change Tracking
Lightweight solution for tracking data changes: Has a row changed? Which rows have been changed? What kind of change was it? Which columns were changed? Only tracks the latest change to a row

36 Adds complexity, requires database tables to manage state
Bulk Table Transfer Specific use cases: Hundreds to thousands of tables to copy Similar loading patterns for all tables Need to minimize amount of code in solution Adds complexity, requires database tables to manage state

37 Bulk Table Transfer Source Sink Source Table Table Type
Stored Procedure Control Table List Sink Table Log Table

38 Auditing: Batches Every ETL Process should start by creating a Batch
Batches are logical concepts used to tie multi-pipeline load processes together for Auditing and Logging A batch is closed when a nightly process is completed (Fail or Success)

39 Auditing: Common Columns
CreatedDate - Date row was inserted CreatedBatchId - Batch that inserted row ModifiedDate - Date row was updated ModfiedBatchId - Batch that updated row IsDeleted - Indicates if record has been removed

40 Logging: Common Columns
Row Counts - Selected, Inserted, Modified, Ignored ExecutionTime - Begin, End, Duration LoadStatus - Fail, Success

41 Demo: Solution Overview
Jason Horner

42 Design Patterns: Key Take Aways
Model your Metadata correctly Make composable single purpose Pipelines Leverage Parameters and User Properties Lookup, Foreach, and Metadata, activities are powerful Edit the JSON files directly when you hit a wall

43 Preview of…?

44 Azure Data Factory Data Flows

45 Azure Data Factory Data Flows
ETL / ELT Visual Authoring Drag and Drop Azure Databricks No Code Transform At Scale Join, Split, Aggregate, Lookup, Filter, Sort, Derived Column

46 Azure Data Factory Data Flows
ETL / ELT

47 Demo: Azure Data Factory Data Flows
Cathrine Wilhelmsen

48 Cathrine Wilhelmsen, Inmeta
Thank you! Jason Horner, Attunix Cathrine Wilhelmsen, Inmeta @jasonhorner @cathrinew

49 Please evaluate this session Your feedback is important to us!
11/22/2018 7:58 AM Please evaluate this session Your feedback is important to us! Please evaluate this session through MyEvaluations on the mobile app or website. Download the app: Go to the website: © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

50 11/22/2018 7:58 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.


Download ppt "BRK2279 Real-World Data Movement and Orchestration Patterns using Azure Data Factory Jason Horner, Attunix Cathrine Wilhelmsen, Inmeta -"

Similar presentations


Ads by Google