Presentation is loading. Please wait.

Presentation is loading. Please wait.

MSBIC Hadoop Series Implementing MapReduce Jobs Bryan Smith

Similar presentations


Presentation on theme: "MSBIC Hadoop Series Implementing MapReduce Jobs Bryan Smith"— Presentation transcript:

1 MSBIC Hadoop Series Implementing MapReduce Jobs Bryan Smith email: bryan.smith@microsoft.com twitter: @smithbryanc

2 MSBIC Hadoop Series http://msbic.sqlpass.org/ Learn the basics of Hadoop through a combination of demonstration and lecture. Session participants are invited to follow along leveraging emulation environments and Azure-based clusters, the setting up of which we will address in our first session. March – Getting StartedAugust – On Vacation April – Understanding the File SystemSeptember – Hadoop & MS BI May – Implementing MapReduce Jobs October – To Be Announced June – Querying the Data with Hive November – Loading Social Media Data July – Processing the Data with PigDecember – DW Integration

3 Today’s Session Objectives: 1.Understand Basics of MapReduce 2.Implement a MapReduce Job 3.Introduce Tez

4 Sample File How Many Evens & Odds? 123456789123456789 odd even odd even odd even odd even odd Step 1 odd {1,3,5,7, 9} even{2,4,6,8} Step 2 keyvalue[ ] map( ) Step 3 odd5 even4 reduce( )

5 Sample Files Name Node Data Node XYZ Job Map Task Reduc e Task P0P0 P1P1

6 Implementing MapReduce using.NET Add the following packages: Microsoft.NET Map Reduce API for Hadoop Microsoft.NET API for Hadoop WebClient Windows Azure Storage (if running against Azure HDInsight) Add the following directives: using Microsoft.Hadoop; using Microsoft.Hadoop.MapReduce; using Microsoft.Hadoop.WebClient.WebHCatClient; If running against Azure HDInsight, change project’s Target Platform to x64

7 MapReduce Demo

8 Goodbye, MapReduce Distributable for Scale Resistant to Failure “Easy” to Program Disk Liberal/Memory Conservative Rigid Step Sequencing

9 MapReduce as a Graph Map Reduce Map Reduce Vertex Edge

10 Tez: An Alternative Model Vertex Directed Acyclic Graph (DAG) Vertex Edge

11 MapReduce vs. Tez MapReduce Focused on Disk Rigid, Linear Step Sequencing Supports Hadoop Streaming Tez Focused on Memory Flexible, Parallel Step Sequencing ???

12 Guidance on MapReduce & Tez Most of your work will be at higher levels, i.e. Pig & Hive Movement from MapReduce will benefit performance & be transparent to you Apache Tez in HDP 2.1 HDInsight lags a few months Microsoft a key contributor

13 Today’s Session Objectives: 1.Understand Basics of MapReduce 2.Implement a MapReduce Job 3.Introduce Tez

14 For Next Session Topic:  Querying Data with Hive  Implement a Hive table and query it using HQL Requested Action(s):  Come with working HDInsight Emulator  Load sample data sets into HDFS on Emulator


Download ppt "MSBIC Hadoop Series Implementing MapReduce Jobs Bryan Smith"

Similar presentations


Ads by Google