… data warehousing has reached the most significant tipping point since its inception. The biggest, possibly most elaborate data management system.

Slides:



Advertisements
Similar presentations
Roger Breu SQL Server PDW Solution Sales Microsoft Western Europe Microsoft Solutions for Big Data | Oct 17th 2013 From Numbers.
Advertisements

Observation Pattern Theory Hypothesis What will happen? How can we make it happen? Predictive Analytics Prescriptive Analytics What happened? Why.
Running Hadoop-as-a-Service in the Cloud
1.Increasing data volumes 2.New data sources and types 3.Real-time data 4.Cloud-born data 5.Hybrid infrastructures “…data warehousing has reached.
Platinum Sponsors Titanium Sponsors. ETL Tool (SSIS, etc) EDW (SQL Svr, Teradata, etc) Extract Original Data Load Transformed Data Transform BI Tools.
Business Intelligence Overview Marc Schöni Technical Solution Professional | Business Intelligence Microsoft Switzerland.
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
Server Files Server RUNTIME Code.
BI in the cloud, is it possible? Sure is with Azure! Rob Hawthorne M352.
4/26/2017 Use Cloud-Based Load Testing Service to Find Scale and Performance Bottlenecks Randy Pagels Sr. Developer Technology Specialist © 2012 Microsoft.
Please note that the session topic has changed
Business Intelligence for everyone 2 For BI to deliver maximum value, all Information Workers must participate: Broad access to uncover and share insights.
AZURE DISTRIBUTED DATA Storage, HDInsight Hadoop, Azure Data Lake.
My Data Wandered Lonely As A Cloud: Azure Data Factory Julie Smith SQL Server MVP Innovative
My Data Wandered Lonely As A Cloud: Azure Data Factory Julie Smith SQL Server MVP Innovative
Andy Roberts Data Architect
AZ PASS User Group Azure Data Factory Overview Josh Sivey, Solution Partner October
An Introduction To Big Data For The SQL Server DBA.
What if your app could put the power of analytics everywhere decisions are made? Modern apps with data visualizations built-in have the power to inform.
Big Data for the SQL Eye Cindy Look, it’s SQL! SELECT score, fun FROM toDo WHERE type = 'they pay me for
INTELLIGENT DATA SOLUTIONS COM Intro to Data Factory PASS Cloud Virtual Chapter March 23, 2015 Steve Hughes, Architect.
Microsoft Ignite /28/2017 6:07 PM
SQL Server 2016 Integration Services (SSIS)
Energy Management Solution
Backups for Azure SQL Databases and SQL Server instances running on Azure Virtual Machines Session on backup to Azure feature (manual and managed) in SQL.
Microsoft Machine Learning & Data Science Summit
Fan Engagement Solution
4/18/2018 6:56 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
5/9/2018 7:28 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS.
Data Platform and Analytics Foundational Training
Using a Gateway to Leverage On-Premises Data in Power BI
Orchestrating Data and Services with Azure Data Factory
Power BI Architecture, Best Practices, and Performance Tuning
Using a Gateway to Leverage On-Premises data in Power BI
ADF & SSIS: New Capabilities for Data Integration in the Cloud
Incrementally Moving to the Cloud Using Biml
Optimizing Edge-Cloud IoT Applications for Performance and Cost
Microsoft Build /22/ :52 PM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY,
Personalized Offers.
Remote Monitoring solution
Energy Management Solution
Using a Gateway to Leverage On-Premises Data in Power BI
Microsoft Ignite NZ October 2016 SKYCITY, Auckland
9/21/2018 3:41 AM BRK3180 Architect your big data solutions with SQL Data Warehouse & Azure Analysis Services Josh Caplan & Matt Usher Program Managers.
A developers guide to Azure SQL Data Warehouse
Azure Data Factory + SSIS: Migrating your ETLs to the Cloud
Orchestration and data movement with Azure Data Factory v2
SSIS in the Cloud Integration Runtime in Azure Data Factory V2
Power BI for large databases
Azure Data Lake for First Time Swimmers
THR1171 Azure Data Integration: Choosing between SSIS, Azure Data Factory, and Azure Databricks Cathrine Wilhelmsen, | cathrinew.net.
Azure Data Factory + SSIS: Migrating your ETLs to the Cloud
Analytics in the Cloud using Microsoft Azure
2/19/2019 9:06 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Azure Data Factory + SSIS: Migrating your ETLs to the Cloud
Azure Data Factory v2: What’s new?
Orchestration and data movement with Azure Data Factory v2
Azure Data Factory + SSIS: Migrating your ETLs to the Cloud
ETL Patterns in the Cloud with Azure Data Factory
Fact vs. Fiction: Why do dataflows in Power BI matter? Use cases
Data Wrangling for ETL enthusiasts
Customer 360.
Michael French Principal Consultant 5/18/2019
Microsoft Business Analytics and AI
Beyond orchestration with Azure Data Factory
SQL Server 2019 Bringing Apache Spark to SQL Server
Get your data flowing with Data Flows! and...umm...dataflows.
Visual Data Flows – Azure Data Factory v2
Visual Data Flows – Azure Data Factory v2
Architecture of modern data warehouse
Presentation transcript:

… data warehousing has reached the most significant tipping point since its inception. The biggest, possibly most elaborate data management system in IT is changing. – Gartner, “The State of Data Warehousing in 2012” Data sources

5 Increasing data volumes 1 Real- time data 2 Non-Relational Data New data sources & types 3 Cloud-born data 4

ETL Tool (SSIS, etc) EDW (SQL Svr, Teradata, etc) Extract Original Data Load Transformed Data Transform BI Tools Data Marts Data Lake(s) Dashboards Apps

ETL Tool (SSIS, etc) EDW (SQL Svr, Teradata, etc) Extract Original Data Load Transformed Data Transform BI Tools Ingest (EL) Original Data Data Marts Data Lake(s) Dashboards Apps

ETL Tool (SSIS, etc) EDW (SQL Svr, Teradata, etc) Extract Original Data Load Transformed Data Transform BI Tools Ingest (EL) Original Data Scale-out Storage & Compute (HDFS, Blob Storage, etc) Transform & Load Data Marts Data Lake(s) Dashboards Apps Streaming data

ETL Tool (SSIS, etc) EDW (SQL Svr, Teradata, etc) Extract Original Data Load Transformed Data Transform BI Tools Ingest (EL) Original Data Scale-out Storage & Compute (HDFS, Blob Storage, etc) Transform & Load Data Marts Data Lake(s) Dashboards Apps Streaming data

BI Tools Data Marts Data Lake(s) Dashboards Apps Data Hub (Storage & Compute) Data Sources (Import From) Move data among Hubs Data Hub (Storage & Compute) Data Sources (Import From) Ingest Connect & CollectTransform & EnrichPublish Information Production: Ingest Move to data mart, etc

BI Tools Data Marts Data Lake(s) Dashboards Apps Data Hub (Storage & Compute) Data Sources (Import From) Data Connector: Import from source to Hub Data Connector: Import/Export among Hubs Data Hub (Storage & Compute) Data Sources (Import From) Data Connector: Import from source to Hub Data Connector: Export from Hub to data store Connect & CollectTransform & EnrichPublish Information Production: Coordination & Scheduling Monitoring & Mgmt Data Lineage

Example Scenario: Customer Profiling (game usage analytics)

2277, :26: ,111, , ,true,8,1, , :26: ,111, , ,true,8,1, , :22: ,111, , ,true,8,1, 2277, :43: ,111, , ,true,8,1, , :11: ,111, , ,true,8,1, , :37: ,111, , ,true,8,1, 2277, :12: ,111, , ,true,8,1, … Log Files Snippet (10s of TBs per day in cloud storage) User Table UserIDFirstNameLastNameState… 2277PratikPatelOregon DaveNettletonWashington 8853MikeFlaskoCalifornia New User Activity Per Week By Region profileiddaystatedurationrankweaponsusedinteractedwith 11486/2/2013Oregon /2/2013Missouri /1/2013Georgia /2/2013Oregon /2/2013California /3/2013Nebraska219552

Data Factory Walkthrough

New-AzureDataFactory -Name “HaloTelemetry“ -Location “West-US“ New-AzureDataFactory -Name “GameTelemetry“ -Location “West-US“

New-AzureDataFactoryLinkedService -Name "MyHDInsightCluster“ -DataFactory“GameTelemetry" -File HDIResource.json New-AzureDataFactoryLinkedService -Name "MyStorageAccount" -DataFactory“GameTelemetry" -File BlobResource.json

On Premises SQL Server Azure Blob Storage 1000’s Log Files New User View Azure Data Factory

On Premises SQL Server Azure Blob Storage 1000’s Log Files New User View Azure Data Factory View Of Game Usage View Of New Users New User Activity

View Of On Premises SQL Server Azure Blob Storage 1000’s Log Files New User View Copy “NewUsers” to Blob Storage Cloud New Users Azure Data Factory View Of Game Usage View Of New Users New User Activity Pipeline

On Premises SQL Server Azure Blob Storage 1000’s Log Files New User View Copy NewUsers to Blob Storage Cloud New Users Azure Data Factory View Of Game Usage View Of Mask & Geo- Code New Users Geo Dictionary Geo Coded Game Usage HDInsight New User Activity Pipeline

On Premises SQL Server Azure Blob Storage 1000’s Log Files New User View Copy NewUsers to Blob Storage Cloud New Users Azure Data Factory View Of Game Usage View Of Runs On Mask & Geo- Code New Users Geo Dictionary Geo Coded Game Usage Join & Aggregate HDInsight New User Activity View Of Pipeline

On Premises SQL Server Azure Blob Storage 1000’s Log Files New User View Copy NewUsers to Blob Storage Cloud New Users Azure Data Factory View Of Game Usage View Of Runs On Mask & Geo- Code New Users Geo Dictionary Geo Coded Game Usage Join & Aggregate HDInsight New User Activity View Of Pipeline

“GeoCoded Game Usage” Table:

Pipeline Definition:

// Deploy Table New-AzureDataFactoryTable -DataFactory“GameTelemetry“ -File NewUserActivityPerRegion.json // Deploy Pipeline New-AzureDataFactoryPipeline -DataFactory “GameTelemetry“ -File NewUserTelemetryPipeline.json // Start Pipeline Set-AzureDataFactoryPipelineActivePeriod -Name “NewUserTelemetryPipeline“ -DataFactory “GameTelemetry“ -StartTime 10/29/ :00:00

"availability": { "frequency": "Day", interval": 1 } Hourly GameUsage Activity: (e.g. Hive) :

Dataset2 Dataset3 Hourly Daily Monday Tuesday Wednesday Daily Monday Tuesday Wednesday Hive Activity GameUsage GeoCodeDictionary Geo-Coded GameUsage

Is my data successfully getting produced? Is it produced on time? Am I alerted quickly of failures? What about troubleshooting information? Are there any policy warnings or errors?

Easily move data to my existing data marts for consumption by my existing BI tools Azure DB SQL Server on premises

Automation & Management Data Transformation & Movement Execution Layer (Data Storage & Processing) Automation/Coordination Layer (Coordination, Scheduling, Management) Low Frequency $0.60$0.48$1.50$1.20 High Frequency $1.00$0.80$2.50$ activities100+ activities0-100 activities100+ activities CloudOn Premises HDInsight (hrs) Compute/VM (hrs) Data Transfer (GB) ADF Pricing Per Month Resources Used to Execute Activities in a Pipeline: Note: public preview = 50% discount on the rates shown above

Coordination: Rich scheduling Complex dependencies Incremental rerun Authoring: JSON & Powershell/C# Management: Lineage Data production policies (late data, rerun, latency, etc) Hub: Azure Hub (HDInsight + Blob storage) Activities: Hive, Pig, C# Data Connectors: Blobs, Tables, Azure DB, On Prem SQL Server, MDS [internal]

Contact me: