Presentation is loading. Please wait.

Presentation is loading. Please wait.

… data warehousing has reached the most significant tipping point since its inception. The biggest, possibly most elaborate data management system.

Similar presentations


Presentation on theme: "… data warehousing has reached the most significant tipping point since its inception. The biggest, possibly most elaborate data management system."— Presentation transcript:

1

2

3

4 … data warehousing has reached the most significant tipping point since its inception. The biggest, possibly most elaborate data management system in IT is changing. – Gartner, “The State of Data Warehousing in 2012” Data sources

5 5 Increasing data volumes 1 Real- time data 2 Non-Relational Data New data sources & types 3 Cloud-born data 4

6 ETL Tool (SSIS, etc) EDW (SQL Svr, Teradata, etc) Extract Original Data Load Transformed Data Transform BI Tools Data Marts Data Lake(s) Dashboards Apps

7 ETL Tool (SSIS, etc) EDW (SQL Svr, Teradata, etc) Extract Original Data Load Transformed Data Transform BI Tools Ingest (EL) Original Data Data Marts Data Lake(s) Dashboards Apps

8 ETL Tool (SSIS, etc) EDW (SQL Svr, Teradata, etc) Extract Original Data Load Transformed Data Transform BI Tools Ingest (EL) Original Data Scale-out Storage & Compute (HDFS, Blob Storage, etc) Transform & Load Data Marts Data Lake(s) Dashboards Apps Streaming data

9 ETL Tool (SSIS, etc) EDW (SQL Svr, Teradata, etc) Extract Original Data Load Transformed Data Transform BI Tools Ingest (EL) Original Data Scale-out Storage & Compute (HDFS, Blob Storage, etc) Transform & Load Data Marts Data Lake(s) Dashboards Apps Streaming data

10 BI Tools Data Marts Data Lake(s) Dashboards Apps Data Hub (Storage & Compute) Data Sources (Import From) Move data among Hubs Data Hub (Storage & Compute) Data Sources (Import From) Ingest Connect & CollectTransform & EnrichPublish Information Production: Ingest Move to data mart, etc

11 BI Tools Data Marts Data Lake(s) Dashboards Apps Data Hub (Storage & Compute) Data Sources (Import From) Data Connector: Import from source to Hub Data Connector: Import/Export among Hubs Data Hub (Storage & Compute) Data Sources (Import From) Data Connector: Import from source to Hub Data Connector: Export from Hub to data store Connect & CollectTransform & EnrichPublish Information Production: Coordination & Scheduling Monitoring & Mgmt Data Lineage

12

13

14 Example Scenario: Customer Profiling (game usage analytics)

15 2277,2013-06-01 02:26:54.3943450,111,164.234.187.32,24.84.225.233,true,8,1,2058 2277,2013-06-01 03:26:23.2240000,111,164.234.187.32,24.84.225.233,true,8,1,2058-2123-2009-2068-2166 2277,2013-06-01 04:22:39.4940000,111,164.234.187.32,24.84.225.233,true,8,1, 2277,2013-06-01 05:43:54.1240000,111,164.234.187.32,24.84.225.233,true,8,1,2058-225545-2309-2068-2166 2277,2013-06-01 06:11:23.9274300,111,164.234.187.32,24.84.225.233,true,8,1,223-2123-2009-4229-9936623 2277,2013-06-01 07:37:01.3962500,111,164.234.187.32,24.84.225.233,true,8,1, 2277,2013-06-01 08:12:03.1109790,111,164.234.187.32,24.84.225.233,true,8,1,234322-2123-2234234-12432-344323 … Log Files Snippet (10s of TBs per day in cloud storage) User Table UserIDFirstNameLastNameState… 2277PratikPatelOregon 664432DaveNettletonWashington 8853MikeFlaskoCalifornia New User Activity Per Week By Region profileiddaystatedurationrankweaponsusedinteractedwith 11486/2/2013Oregon2163315 10046/2/2013Missouri224062 2926/1/2013Georgia20113715 10596/2/2013Oregon2710452 6756/2/2013California6516432 13486/3/2013Nebraska219552

16 Data Factory Walkthrough

17 New-AzureDataFactory -Name “HaloTelemetry“ -Location “West-US“ New-AzureDataFactory -Name “GameTelemetry“ -Location “West-US“

18 New-AzureDataFactoryLinkedService -Name "MyHDInsightCluster“ -DataFactory“GameTelemetry" -File HDIResource.json New-AzureDataFactoryLinkedService -Name "MyStorageAccount" -DataFactory“GameTelemetry" -File BlobResource.json

19 On Premises SQL Server Azure Blob Storage 1000’s Log Files New User View Azure Data Factory

20 On Premises SQL Server Azure Blob Storage 1000’s Log Files New User View Azure Data Factory View Of Game Usage View Of New Users New User Activity

21 View Of On Premises SQL Server Azure Blob Storage 1000’s Log Files New User View Copy “NewUsers” to Blob Storage Cloud New Users Azure Data Factory View Of Game Usage View Of New Users New User Activity Pipeline

22 On Premises SQL Server Azure Blob Storage 1000’s Log Files New User View Copy NewUsers to Blob Storage Cloud New Users Azure Data Factory View Of Game Usage View Of Mask & Geo- Code New Users Geo Dictionary Geo Coded Game Usage HDInsight New User Activity Pipeline

23 On Premises SQL Server Azure Blob Storage 1000’s Log Files New User View Copy NewUsers to Blob Storage Cloud New Users Azure Data Factory View Of Game Usage View Of Runs On Mask & Geo- Code New Users Geo Dictionary Geo Coded Game Usage Join & Aggregate HDInsight New User Activity View Of Pipeline

24 On Premises SQL Server Azure Blob Storage 1000’s Log Files New User View Copy NewUsers to Blob Storage Cloud New Users Azure Data Factory View Of Game Usage View Of Runs On Mask & Geo- Code New Users Geo Dictionary Geo Coded Game Usage Join & Aggregate HDInsight New User Activity View Of Pipeline

25 “GeoCoded Game Usage” Table:

26 Pipeline Definition:

27 // Deploy Table New-AzureDataFactoryTable -DataFactory“GameTelemetry“ -File NewUserActivityPerRegion.json // Deploy Pipeline New-AzureDataFactoryPipeline -DataFactory “GameTelemetry“ -File NewUserTelemetryPipeline.json // Start Pipeline Set-AzureDataFactoryPipelineActivePeriod -Name “NewUserTelemetryPipeline“ -DataFactory “GameTelemetry“ -StartTime 10/29/2014 12:00:00

28 "availability": { "frequency": "Day", interval": 1 } Hourly 12-1 1-2 2-3 GameUsage Activity: (e.g. Hive) :

29 Dataset2 Dataset3 Hourly 12-1 1-2 2-3 Daily Monday Tuesday Wednesday Daily Monday Tuesday Wednesday Hive Activity GameUsage GeoCodeDictionary Geo-Coded GameUsage

30 Is my data successfully getting produced? Is it produced on time? Am I alerted quickly of failures? What about troubleshooting information? Are there any policy warnings or errors?

31

32

33

34 Easily move data to my existing data marts for consumption by my existing BI tools Azure DB SQL Server on premises

35 Automation & Management Data Transformation & Movement Execution Layer (Data Storage & Processing) Automation/Coordination Layer (Coordination, Scheduling, Management) Low Frequency $0.60$0.48$1.50$1.20 High Frequency $1.00$0.80$2.50$2.00 0-100 activities100+ activities0-100 activities100+ activities CloudOn Premises HDInsight (hrs) Compute/VM (hrs) Data Transfer (GB) ADF Pricing Per Month Resources Used to Execute Activities in a Pipeline: Note: public preview = 50% discount on the rates shown above

36 Coordination: Rich scheduling Complex dependencies Incremental rerun Authoring: JSON & Powershell/C# Management: Lineage Data production policies (late data, rerun, latency, etc) Hub: Azure Hub (HDInsight + Blob storage) Activities: Hive, Pig, C# Data Connectors: Blobs, Tables, Azure DB, On Prem SQL Server, MDS [internal]

37

38 Contact me: mike.flasko@microsoft.com

39

40 www.microsoft.com/learning http://microsoft.com/technet http://channel9.msdn.com/Events/TechEd http://developer.microsoft.com

41

42


Download ppt "… data warehousing has reached the most significant tipping point since its inception. The biggest, possibly most elaborate data management system."

Similar presentations


Ads by Google