Presentation is loading. Please wait.

Presentation is loading. Please wait.

AZURE DISTRIBUTED DATA Storage, HDInsight Hadoop, Azure Data Lake.

Similar presentations


Presentation on theme: "AZURE DISTRIBUTED DATA Storage, HDInsight Hadoop, Azure Data Lake."— Presentation transcript:

1 AZURE DISTRIBUTED DATA Storage, HDInsight Hadoop, Azure Data Lake

2 GOALS AND QUESTIONS

3

4 STORE IT Azure Blob Storage Azure Data Lake Store

5 COMPUTE IT Hadoop on Azure – HDInsight on Linux Azure Data Lake Analytics with YARN

6

7 WHAT IS BIG DATA? It Is Scale Out Enables elasticity Encourages exploration Faster data ingestion Lower TCO Empowers self-service BI and analytics Rapid time to insight It Is NOT A well-defined thing About volume, size A replacement for everything The answer to every problem

8 Part 3: Single Slide A leading game development studio that creates, develops, produces, and publishes a number of popular video games needed to analyze large amounts of in-game data that were unstructured. They chose Azure HDInsight, Data Factory, SQL Server on-premises, Power View, Power Query to do in- game analytics and understanding what gamers do during game-play and what campaigns they can run to influence in-game purchases. Finally, twitter sentiment is collected to correlate with sales.

9 Game Development Company Gaming A predominantly mobile-based game development company. While they are a mid-sized organization, they have partnered with media giants on various gaming projects Part 1: What They Did | In-game Analytics Challenge As a game development studio, they wanted to do in-game analytics to understand their players more and what they do in the games Solution Azure HDInsight (MapReduce and Storm), Service Bus, SQL Server for reporting Collects telemetry and logging data to gain in-game analytics: How many players using the game How many players invited their friends How far along did players get into the tutorial How many attempts did they make on one level/stage In-game Analytics Media tonic

10 BK1 Game Development Company Part 2: How They Did It | In-game Analytics How They Did It Collect data from games in Azure Blobs Game sends telemetry/logging data as JSON files Contains every action of user in the game Data is pushed to Azure Service Bus as real-time Tens of Gigabytes of data captured daily HDInsight picks up real-time data and processes From Service Bus, HDInsight processes using Apache Storm and MapReduce Constantly running experiments to determine insight A/B testing In-game metrics and analytics Spin up 32-node cluster nightly for four hours Output sent to SQL Server for BI Transfer data to SQL Server for BI In-game Analytics Service Bus SQL Server On-premises

11 A game development studio that wanted to do in-game analytics to understand their players more and what they do in their games. They chose Azure HDInsight including Storm in HDInsight so they can do near real-time in-game analytics of their users. Now, they can understand how many players are playing, how many are referring the game, how difficult a game level is, etc.

12 Typical Big Data Use Cases Smart meter monitoring Equipment monitoring Advertising analysis Life sciences research Fraud detection Healthcare outcomes Weather forecasting Natural resource exploration Social network analysis Churn analysis Traffic flow optimization Legal discovery Telemetry IT infrastructure optimization

13 HADOOP SHINES WHEN…. Data exploration, analytics and reporting, new data-driven actionable insights Rapid iterating Unknown unknowns Flexible scaling Data driven actions for early competitive advantage or first to market Low number of direct, concurrent users Low cost data archival

14 HADOOP ANTI-PATTERNS…. Replace system whose pain points don’t align with Hadoop’s strengths OLTP needs adequately met by an existing system Known data with a static schema Many end users Interactive response time requirements (becoming less true) Your first Hadoop project + mission critical system

15 APPENDIX

16 CLOUD STORAGE Blobs + WASB Open source access from Hadoop to Azure Storage Blobs, flexible use Azure Data Lake Store HDFS, Virtually unlimited scale, intelligent data storage, enterprise grade security, flexible use Optimized proprietary formats like SQL Server, HBase Rich feature set around specific scenarios Data Factory Ingest, transform, move, process, analyze data – ELT, ETL, EHL

17 COMPUTE Non-Relational Flexible format and code Hadoop on Linux or Windows HDInsight (100% Apache: Hive, Pig, Storm, HBase, Spark….) Any Hadoop distro on IaaS VMs Scale-out technologies like MongoDB, Cassandra, Qubole on IaaS Polybase or Hadoop Region on APS Relational Rich feature set, optimized for specific scenarios Azure Data Lake Analytics Ad hoc analytics with virtually unlimited scale, YARN U-SQL -.NET unified with SQL Machine Learning SparkML, R, Azure Machine Learning Data Factory Ingest, transform, move, process, analyze data – ELT, ETL, EHL

18 USE CASES Exploration Fail fast iteration Scale out Unknown unknowns Fast time to insight

19 Your choice in analytics Real-time, more history, fast ingestion ODBC makes Hive and Spark “just another data source” Experimentation via “fail fast” iteration Enables the business user … And new expectations around latency IT ADDS UP TO MORE OPTIONS 19

20 AZURE HAS SO MUCH MORE Go straight to the business code Scale storage and compute separately Open Source Linux Managed and unmanaged services Hybrid On-demand and 24x7 options SQL Server @SQLCindy


Download ppt "AZURE DISTRIBUTED DATA Storage, HDInsight Hadoop, Azure Data Lake."

Similar presentations


Ads by Google