Presentation is loading. Please wait.

Presentation is loading. Please wait.

INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM Intro to Data Factory PASS Cloud Virtual Chapter March 23, 2015 Steve Hughes, Architect.

Similar presentations


Presentation on theme: "INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM Intro to Data Factory PASS Cloud Virtual Chapter March 23, 2015 Steve Hughes, Architect."— Presentation transcript:

1 INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM Intro to Data Factory PASS Cloud Virtual Chapter March 23, 2015 Steve Hughes, Architect

2 INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM 2 About the Presenter Steve Hughes – Architect for Pragmatic Works Blog: www.dataonwheels.comwww.dataonwheels.com Twitter: @dataonwheels LinkedIn: linked.com/in/dataonwheels Email: shughes@pragmaticworks.com

3 INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM 3 What is Data Factory? Cloud-based, highly scalable data movement and transformation tool Built on Azure for integrating all kinds of data Still in preview so it is likely not yet feature complete (e.g. Machine Learning Activity added in December 2014)

4 INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM 4 Data Factory Components Linked Servers SQL Server Database – PaaS, IaaS, On Premise Azure Storage – Blob, Table Datasets Input/Output using JSON deployed with PowerShell Pipelines Activities using JSON deployed with PowerShell Copy, HDInsight, Azure Machine Learning

5 INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM 5 Current Activities Supported CopyActivity copy data from a source to a sink (destination) HDInsightActivity – Pig, Hive, MapReduce Transformations MLBatchScoringActivity – Can be used to score data with the ML Batch Scoring API StoredProcedureActivity – Executes stored procedures in an Azure SQL Database C# or.NET Custom Activity

6 INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM Data for the Demo Movies.txt in Azure Blob Storage Movies table in Azure SQL Database

7 INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM Building a Data Factory Pipeline 1.Create Data Factory 2.Create Linked Services 3.Create Input and Output Tables or Datasets 4.Create Pipeline 5.Set the Active Period for the Pipeline

8 INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM 8 Step 1: Create a Data Factory in Windows Azure

9 INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM 9 Step 2 – Create Linked Services 1.Click Linked Services tile 2.Add Data Stores 1.Add Blob Storage 2.Add SQL Database Three Data Store Types Supported: Azure Storage Account Azure SQL Database SQL Server Data Gateways can also be used for on premise SQL Server sources

10 INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM 10 Step 3 – Create Datasets/Tables JSON File for Datasets Structure – Name, Type (String,Int,Decimal,Guid,Boolean,Date) {name: “ThisName”, type:”String”} Location – Azure Table, Azure Blob, SQL Database Availability – “cadence in which a slice of the table is produced”

11 INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM 11 Step 3 – Input JSON { "name": "MoviesFromBlob", "properties": { "structure": [ { "name": "MovieTitle", "type": "String"}, { "name": "Studio", "type": "String"}, { "name": "YearReleased", "type": "Int"} ], "location": { "type": "AzureBlobLocation", "folderPath": "data-factory-files/Movies.csv", "format": { "type": "TextFormat", "columnDelimiter": "," }, "linkedServiceName": "Shughes Blob Storage" }, "availability": { "frequency": "hour", "interval": 4 } Structure defines the structure of the data in the file Location defines the location and file format information Availability sets the cadence to once every 4 hours Dataset Name

12 INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM 12 Step 3 – Output JSON { "name": "MoviesToSqlDb", "properties": { "structure": [ { "name": "MovieName", "type": "String"}, { "name": "Studio", "type": "String"}, { "name": "YearReleased", "type": "Int"} ], "location": { "type": "AzureSQLTableLocation", "tableName": "Movies", "linkedServiceName": "Media Library DB" }, "availability": { "frequency": "hour", "interval": 4 } Dataset Name Structure defines the table Structure, only fields targeted are mapped Location defines the location and the table name Availability sets the cadence to once every 4 hours

13 INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM 13 Step 3 – Deploy Datasets Deployment is done via PowerShell PS C:\> New-AzureDataFactoryTable -ResourceGroupName shughes-datafactory - DataFactoryName shughes-datafactory -File c:\data\JSON\MoviesFromBlob.json PS C:\> New-AzureDataFactoryTable -ResourceGroupName shughes-datafactory - DataFactoryName shughes-datafactory -File c:\data\JSON\MoviesToSqlDb.json

14 INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM 14 Step 4 – Pipeline JSON { "name": "MoviesPipeline", "properties": { "description" : "Copy data from csv file in Azure storage to Azure SQL database table", "activities": [ { "name": "CopyMoviesFromBlobToSqlDb", "description": "Add new movies to the Media Library", "type": "CopyActivity", "inputs": [ {"name": "MoviesFromBlob"} ], "outputs": [ {"name": "MoviesToSqlDb"} ], "transformation": { "source": { "type": "BlobSource" }, "sink": { "type": "SqlSink" } Pipeline Name Activity definition – type (CopyActivity), Input, Output Activity Name CopyActivity transformation – source and sink "Policy": { "concurrency": 1, "executionPriorityOrder": "NewestFirst", "style": "StartOfInterval", "retry": 0, "timeout": "01:00:00" } Policy required for SqlSink – concurrency must be set or deployment fails

15 INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM 15 Step 4 – Deploy Pipeline New-AzureDataFactoryPipeline -ResourceGroupName shughes- datafactory -DataFactoryName shughes-datafactory -File c:\data\JSON\MoviesPipeline.json

16 INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM 16 Step 4 – Deployed Pipeline

17 INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM 17 Step 4 – Pipeline Diagram

18 INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM 18 Step 5 – Set Active Period Set-AzureDataFactoryPipelineActivePeriod -ResourceGroupName shughes- datafactory -DataFactoryName shughes-datafactory -StartDateTime 2015-01- 12 –EndDateTime 2015-01-14 –Name MoviesPipeline This gives the duration that data slices will be available to be processed. The frequency is set in the dataset parameters.

19 INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM Exploring Blades in Azure Portal Start with the Diagram Drill to various details in the pipeline Latest Update full online design capability

20 INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM Looking at Monitoring Review monitoring information in Azure portal

21 INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM 21 Common Use Cases Log Import for Analysis

22 INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM 22 Resources Azure Storage Explorer – Codeplex.com Azure.Microsoft.com – Data Factory Azure.Microsoft.com – Azure PowerShell

23 Products Improve the quality, productivity, and performance of your SQL Server and BI solutions. Services Speed development through training and rapid development services from Pragmatic Works. Foundation Helping those who don’t have the means to get into information technology and to achieve their dreams. Questions? Contact me at steve@dataonwheels.com shughes@pragmaticworks.co m Blog: www.dataonwheels.comwww.dataonwheels.com Pragmatic Works: www.pragmaticworks.com www.pragmaticworks.com


Download ppt "INTELLIGENT DATA SOLUTIONS WWW.PRAGMATICWORKS. COM Intro to Data Factory PASS Cloud Virtual Chapter March 23, 2015 Steve Hughes, Architect."

Similar presentations


Ads by Google