Microsoft Ignite 2015 4/28/2017 6:07 PM © 2015 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Real-Time Analytics at Scale for Internet of Things Asad Khan Principal Program Manager Nishant Thacker Product Marketing Manager
IoT Scenario - Connected Cars / Devices Document Store No SQL Store Relational Store Queue Service Get Data Get Reference Data Business Logic Store Raw Data Store Reporting Data Live Dashboard Cloud gateways Queue Service Event Hubs
Customer use cases Scenario Connected Cars ETL IoT Fraud detection Input Operators (Examples) Side Lookup Output Programming Language Connected Cars Event hubs Window based aggregation, Join stream / split stream HBase, ML DocumentDB C# hybrid, Java ETL Event Hubs Partitioning / organize N/A WASB Java IoT Window based aggregation Hbase, ML DocumentDB, HBase Fraud detection ServiceBus Queue Filter ML Hbase C# hybrid Social analytics Twitter Groupby / trending topics Realtime dashboard (BI) Trident Network monitoring Kafka Split (on success / failure) SQL Log Search Storage Queue / Event Hub Parsing & index Elastic Search Mobile engagement Eventhub Count HBase SignalR
Hadoop Data Platform Hadoop - Data processing and storage platform 4/28/2017 Hadoop Data Platform Hadoop - Data processing and storage platform Batch Hive, Pig, MapReduce NoSQL HBase Stream Storm Other Mahout, Oozie, Spark Microsoft Big Data solutions, including HDInsight on Microsoft Azure, are based on a Hadoop distribution called the HortonWorks Data Platform (HDP). It uses the YARN resource manager to implement a runtime platform for a wide range of data query, transformation, and storage tools and applications. The figure shows the high-level architecture of HDP, and how it supports the tools and applications. High-level architecture of the HortonWorks Data Platform The three most commonly used tools for processing data by executing queries and transformations, in order of popularity, are Hive, Pig, and map/reduce. HCatalog is a feature of Hive that provides, amongst other features, a way to remove dependencies on literal file paths in order to stabilize and unify solutions that incorporate multiple steps. Mahout is a scalable machine learning library for clustering, classification, and collaborative filtering that you can use to examine data files in order to extract specific types of information. Storm is a real-time data processing application that is designed to handle streaming data. These applications can be used for a wide variety of tasks, and many of them can be easily combined into multi-step workflows by using Oozie. Data Storage Layer (HDFS) Hadoop is an Open Source, scalable, fault tolerant platform for large amount of unstructured data storage and processing, distributed across large number of machines © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
What is Azure HDInsight Microsoft’s cloud Big Data offering 100% open source Apache Hadoop Up and running in minutes with no hardware to deploy Harness existing .NET and Java skills Utilize familiar BI tools and application frameworks
What is HIVE? Create, load, and query Hive tables. Hive SQL Includes data definition language, data import/export and data manipulation language statements Create table Query data using SQL-like statements
What is HBase? Apache HBase is a distributed low-latency NoSQL database designed to handle large scale datasets. Storm NoSQL on top of Hadoop Large scale Low latency Open Source Columnar, schema-free data model Events HBase Hadoop APIs Mobile Batch Analytics Web Apps Web
Queuing Service / Direct API What is Storm? Apache Storm is a distributed, fault-tolerant, open-source, real-time event processing solution for large, fast streams of data. Batch Processing Sentiment Clickstream Machine/Sensor Server Logs Geo-location Data Store (HBase, SQL) Queuing Service / Direct API Storm Real time processing System Real time dashboard
IoT Scenario - Connected Cars / Devices Document Store No SQL Store Relational Store Queue Service Get Data Get Reference Data Business Logic Store Raw Data Store Reporting Data Live Dashboard Cloud gateways Queue Service Event Hubs
IoT Scenario - Connected Cars / Devices HBase SQL Azure DocumentDB Document Store No SQL Store Relational Store Event Hubs Queue Service PowerBI Get Data Get Reference Data Business Logic Store Raw Data Store Reporting Data Live Dashboard Cloud gateways Apache Storm Queue Service Event Hubs
Demo Asad Khan
HDInsight – Call to Action 4/28/2017 6:07 PM HDInsight – Call to Action Key Sessions at Ignite BRK3555-Real-Time Analytics at Scale for Internet of Things BRK2550-Big Data for the SQL Ninja BRK2576-Planning your Big Data Architecture on Azure BRK3556-Optimizing Hadoop using Microsoft Azure HDInsight BRK3559-Build Hybrid Big Data Pipelines with Azure Data Factory and Azure HDInsight Sign Up for HDInsight Free Trial http://azure.com/hdinsight Sign up for Azure Data Lake Preview http://azure.com/datalake © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Ignite Azure Challenge Sweepstakes 4/28/2017 6:07 PM Ignite Azure Challenge Sweepstakes Attend Azure sessions and activities, track your progress online, win raffle tickets for great prizes! Aka.ms/MyAzureChallenge Enter this session code online: “XXDD” (10) - Microsoft Surface Pro 3 Core i5 256GB (30) – Xbox One (55) – Microsoft Band Offers throughout the week NO PURCHASE NECESSARY. Open only to event attendees. Winners must be present to win. Game ends May 9th, 2015. For Official Rules, see The Cloud Platform Lounge or aka.ms/myazurechallenge © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
4/28/2017 6:07 PM © 2014 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.