Presentation is loading. Please wait.

Presentation is loading. Please wait.

HDInsight on Azure and Map-Reduce Richard Conway Windows Azure MVP Elastacloud Limited.

Similar presentations


Presentation on theme: "HDInsight on Azure and Map-Reduce Richard Conway Windows Azure MVP Elastacloud Limited."— Presentation transcript:

1 HDInsight on Azure and Map-Reduce Richard Conway Windows Azure MVP Elastacloud Limited

2

3 Introduction

4

5 Big Data vs Big Compute

6 Compute Bound IO Bound

7 All distributed compute works on the basis of taking a large JOB and breaking it to many smaller TASKS which are then parallelised

8 Hadoop HPC

9 Understanding Big Data

10 $100 gets you 3million times more storage in 30 years) 1980 10 MIPS/$ 2005 10M MIPS/$ >5.5 billion (70+% of global population) >2 Billion users Web traffic 2010 130 Exabyte (10 E18) 2015 1.6 ZettaByte (10 E21) >10 Billion

11 Internet of things Audio / Video Log Files Text/Image Social Sentiment Data Market Feeds eGov Feeds Weather Wikis / Blogs Click Stream Sensors / RFID / Devices Spatial & GPS Coordinates WEB 2.0 Mobile Advertisin g CollaborationeCommerce Digital Marketing Search Marketing Web Logs Recommendation s ERP / CRM Sales Pipeline Payables Payroll Inventory Contacts Deal Tracking Terabytes (10E12) Gigabytes (10E9) Exabytes (10E18) Petabytes (10E15) Velocity - Variety - variability Volume 1980 190,000$ 2010 0.07$ 1990 9,000$ 2000 15$ Storage/GB ERP / CRM WEB 2.0 Internet of things

12 Big Data, BIG OPPORTUNITY 49% CEOs and CIOs are planning big data projects Software Growth Services Growth 1. McKinsey&Company, McKinsey Global Survey Results, Minding Your Digital Business, 2012 2. IDC Market Analysis, Worldwide Big Data Technology and Services 2012–2015 Forecast, 2012

13 Invisible devices Trillions of networked nodes Low bandwidth last- mile connection Mostly addressed by local schemes Machine-centricSensing-focus Global addressingUser-centric Communication- focus Laptops / tablets / smartphones Billions of networked devices High-bandwidth access

14 Big Data Scenarios

15

16 Hadoop Distributed Architecture

17 Server Files Server

18 RUNTIME Code

19 TRADITIONAL RDBMSHADOOP Data Size Access Updates Structure Integrity Scaling DBA Ratio

20 Windows Azure HDInsight Service

21 Demo

22 Distributed Storage (HDFS) Query (Hive) Distributed Processing (MapReduce) HDINSIGHT / HADOOP Eco-System Legend Red = Core Hadoop Blue = Data processing Purple = Microsoft integration points and value adds Orange = Data Movement Green = Packages

23 Storing Data with HDInsight

24 Front end Stream Layer Partition Layer Name Node de Data Node Front end HDFS API DFS (1 Data Node per Worker Role) and Compute Cluster Azure Storage (ASV) … Azure Blob Storage

25

26 Map Reduce Examples in C#

27

28

29

30

31

32 public class FrenchSessionsJob : HadoopJob { public override HadoopJobConfiguration Configure(ExecutorContext context) { var config = new HadoopJobConfiguration() { InputPath = "\"/AllSessions/*.gz\"", OutputFolder = "/FrenchSessions/" }; return config; }

33 public class FrenchSessionsMapper : MapperBase { public override void Map(string inputLine, MapperContext context) { if (inputLine.Contains("Country=France") { context.IncrementCounter("FrenchSession"); context.EmitKeyValue("FR", "1"); }

34 public class SessionsReducer : ReducerCombinerBase { public override void Reduce(string key, IEnumerable values, ReducerContext context) { context.EmitKeyValue(key, values.Count()); }

35 Demo

36

37 https://elastastorage.blob.core.windows.net/hdinsigh t/Map-Reduce HDInsight Lab.pdf

38 Questions?


Download ppt "HDInsight on Azure and Map-Reduce Richard Conway Windows Azure MVP Elastacloud Limited."

Similar presentations


Ads by Google