Presentation is loading. Please wait.

Presentation is loading. Please wait.

Near Real Time ETLs with Azure Serverless Architecture

Similar presentations


Presentation on theme: "Near Real Time ETLs with Azure Serverless Architecture"— Presentation transcript:

1 Near Real Time ETLs with Azure Serverless Architecture
Samara Soucy Innovative Architects 9/16/18

2 Samara Soucy Microsoft Certified Specialist – Programing in C# Software Development Consultant – Innovative Oneangrypenguin.com

3 What does it mean when a service is serverless?
There are still servers (duh) Next iteration after PaaS Consumers don’t have to worry about resource management. Dynamic Scaling Pricing is usage based (Usually) easier to develop against

4

5

6 Cons Pros ETL via Microservices
Lose performance because components must communicate rather than having a single process. Complicated system to maintain. Divide and Conquer Easier to scale just the parts that need it Removes the possibility of dependency hell Services are easy to understand and maintain.

7 Service Roles Source & Destination
Event – “We’ve got data to work on.” Manager – Decides when an event will happen or controls process flow. Router – Moves events between services. May transmit data as well. Worker – Moves data between source and destination, may perform transforms. Monitor – Makes sure that the processes are all running and may check data quality. Utility – Performs background maintenance tasks (ex. Moving old data to an archive). ** A service will often fit into multiple roles.

8 Basic Architecture

9 Separate Manager and/or Router

10 ETL via Microservices pt. 2
Entrance to the ETL should be as close to the originating event as possible. If possible, event data should be possible to pass through Azure queue systems. Event Grid is the smallest at 64 KB per message. Each service should preform a single task. Usually. When it doesn’t create unessesary complexity. Balance is important. When chaining services together, consider whether or not you may want multiple things to trigger off a specific processing stage. If there is a posibility that you need to fan out or fan in at a specific point, make sure the data stream makes a stop at at a router. Take advantage of the Visual Studio projects for Azure serverless. Keeping the code for all the services in a single solution makes it easier to pull up all the code for your ETL system than clicking around in the portal.

11 Azure Serverless Services
Meet the toys!

12 The Workers pt. 1 Logic Apps Functions
Similar experience to SSIS (but better) Limited transform capabilities, but it can call Functions to perform that task. Strongest at managing process flow. Integrates with Azure services, Http, timers and many 3rd party APIs out of the box. No code required (minus any Functions integrations). Extends the capabilities of Web Jobs. Small processes built in C# or JS. Experimental support for other languages like Python, PHP, Powershell and others. Integrates natively with many other Azure services, HTTP, and timers triggers. Strongest tool for transforms, most flexible.

13 The Workers pt. 2 Stream Analytics Cognitive Services
SQL-like queries against a data stream. Primarily attatches to Event Hub or Iot Hub, paired with Azure storage for contextual data. Allows for measures to be computed in real time. Strong for finding values out of range, trends, and possible fraud. Machine Learning as an API Lots of tools like speech to text, sentiment analysis, search, natural language processing, and many more. Integration with Stream Analytics- call many of the APIs directly from your SQL Query.

14 The Routers Event Grid Event Hub HTTPS input and push to subscribers.
Offers some basic filtering of which events get pushed to which subscribers. HTTPS (in) or APMQ based message routing. Caches messages for a set period of time so they can be replayed. Default is 24 hours. Handles both batched messages and streams of data. If you are going to use Stram Analytics, you want Event Hub in your pipeline.

15 The Destinations Cosmos DB Azure Storage
No SQL successor to DocumentDB Multiple storage types, multiple API options Scale on demand Globally distributed File and Blob storage Will be used to store code and logs for most of the other serverless offerings Used as a destination and as a way to store data that will be used in multiple services rather than passing it in the routers.

16 Application Insights (The Monitor)
Comprehensive application monitoring and logging. Azure Funtions are closely integrated with App Insights, almost all metrics run through there. When using App Insight with Azure Funtions, consider turning down the sampling rate. You will end up spending significantly more on app insights than on your Functions May also act as a source since it provides a real time feed of the status of whatever application it is monitoring. Setup continuous export to Azure Storage to allow real time ingestion by a real time analytics feed.

17

18

19

20

21

22 Links computing/ Presentation Links:


Download ppt "Near Real Time ETLs with Azure Serverless Architecture"

Similar presentations


Ads by Google