O’Reilly – Hadoop: The Definitive Guide Ch.6 How MapReduce Works 16 July 2010 Taewhi Lee.

Slides:



Advertisements
Similar presentations
Distributed and Parallel Processing Technology Chapter2. MapReduce
Advertisements

Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)
The map and reduce functions in MapReduce are easy to test in isolation, which is a consequence of their functional style. For known inputs, they produce.
MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO
MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
MapReduce Simplified Data Processing on Large Clusters
MapReduce.
Mapreduce and Hadoop Introduce Mapreduce and Hadoop
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
MapReduce Online Veli Hasanov Fatih University.
Developing a MapReduce Application – packet dissection.
Parasol Architecture A mild case of scary asynchronous system stuff.
O’Reilly – Hadoop: The Definitive Guide Ch.5 Developing a MapReduce Application 2 July 2010 Taewhi Lee.
Spark: Cluster Computing with Working Sets
Hadoop: The Definitive Guide Chap. 2 MapReduce
Hadoop: Nuts and Bolts Data-Intensive Information Processing Applications ― Session #2 Jimmy Lin University of Maryland Tuesday, February 2, 2010 This.
Lecture 3 – Hadoop Technical Introduction CSE 490H.
MapReduce Simplified Data Processing on Large Clusters Google, Inc. Presented by Prasad Raghavendra.
7/14/2015EECS 584, Fall MapReduce: Simplied Data Processing on Large Clusters Yunxing Dai, Huan Feng.
CPS216: Advanced Database Systems (Data-intensive Computing Systems) How MapReduce Works (in Hadoop) Shivnath Babu.
L22: SC Report, Map Reduce November 23, Map Reduce What is MapReduce? Example computing environment How it works Fault Tolerance Debugging Performance.
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Hadoop, Hadoop, Hadoop!!! Jerome Mitchell Indiana University.
Hadoop: The Definitive Guide Chap. 8 MapReduce Features
Inter-process Communication in Hadoop
Hadoop & Cheetah. Key words Cluster  data center – Lots of machines thousands Node  a server in a data center – Commodity device fails very easily Slot.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.
MapReduce. Web data sets can be very large – Tens to hundreds of terabytes Cannot mine on a single server Standard architecture emerging: – Cluster of.
Distributed and Parallel Processing Technology Chapter6
Google MapReduce Simplified Data Processing on Large Clusters Jeff Dean, Sanjay Ghemawat Google, Inc. Presented by Conroy Whitney 4 th year CS – Web Development.
Map Reduce: Simplified Data Processing On Large Clusters Jeffery Dean and Sanjay Ghemawat (Google Inc.) OSDI 2004 (Operating Systems Design and Implementation)
THE HOG LANGUAGE A scripting MapReduce language. Jason Halpern Testing/Validation Samuel Messing Project Manager Benjamin Rapaport System Architect Kurry.
Süleyman Fatih GİRİŞ CONTENT 1. Introduction 2. Programming Model 2.1 Example 2.2 More Examples 3. Implementation 3.1 ExecutionOverview 3.2.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES.
WINTER Template Distributed Computing at Web Scale Kyonggi University. DBLAB. Haesung Lee.
f ACT s  Data intensive applications with Petabytes of data  Web pages billion web pages x 20KB = 400+ terabytes  One computer can read
Cloud Distributed Computing Platform 2 Content of this lecture is primarily from the book “Hadoop, The Definite Guide 2/e)
大规模数据处理 / 云计算 Lecture 5 – Hadoop Runtime 彭波 北京大学信息科学技术学院 7/23/2013 This work is licensed under a Creative Commons.
MapReduce How to painlessly process terabytes of data.
Google’s MapReduce Connor Poske Florida State University.
CPS216: Advanced Database Systems (Data-intensive Computing Systems) Introduction to MapReduce and Hadoop Shivnath Babu.
Content Addressable Network CAN. The CAN is essentially a distributed Internet-scale hash table that maps file names to their location in the network.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
O’Reilly – Hadoop: The Definitive Guide Ch.7 MapReduce Types and Formats 29 July 2010 Taikyoung Kim.
Presented by: Katie Woods and Jordan Howell. * Hadoop is a distributed computing platform written in Java. It incorporates features similar to those of.
SECTION 5: PERFORMANCE CHRIS ZINGRAF. OVERVIEW: This section measures the performance of MapReduce on two computations, Grep and Sort. These programs.
By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.
MapReduce : Simplified Data Processing on Large Clusters P 謝光昱 P 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
Using Map-reduce to Support MPMD Peng
MapReduce Joins Shalish.V.J. A Refresher on Joins A join is an operation that combines records from two or more data sets based on a field or set of fields,
Next Generation of Apache Hadoop MapReduce Owen
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
MapReduce: Simplied Data Processing on Large Clusters Written By: Jeffrey Dean and Sanjay Ghemawat Presented By: Manoher Shatha & Naveen Kumar Ratkal.
1 Student Date Time Wei Li Nov 30, 2015 Monday 9:00-9:25am Shubbhi Taneja Nov 30, 2015 Monday9:25-9:50am Rodrigo Sanandan Dec 2, 2015 Wednesday9:00-9:25am.
Lecture 3 – MapReduce: Implementation CSE 490h – Introduction to Distributed Computing, Spring 2009 Except as otherwise noted, the content of this presentation.
OpenPBS – Distributed Workload Management System
Chapter 10 Data Analytics for IoT
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
MapReduce Simplied Data Processing on Large Clusters
The Basics of Apache Hadoop
湖南大学-信息科学与工程学院-计算机与科学系
Data processing with Hadoop
Lecture 16 (Intro to MapReduce and Hadoop)
MAPREDUCE TYPES, FORMATS AND FEATURES
COS 518: Distributed Systems Lecture 11 Mike Freedman
MapReduce: Simplified Data Processing on Large Clusters
Presentation transcript:

O’Reilly – Hadoop: The Definitive Guide Ch.6 How MapReduce Works 16 July 2010 Taewhi Lee

Outline  Anatomy of a MapReduce Job Run  Failures  Job Scheduling  Shuffle and Sort  Task Execution 2

Outline  Anatomy of a MapReduce Job Run  Failures  Job Scheduling  Shuffle and Sort  Task Execution 3

Anatomy of a MapReduce Job Run 4  You can run a MapReduce job with a single line of code –JobClient.runJob(conf)  But, it conceals a great deal of processing behind the scenes

Entities Being Involved in a MapReduce Job Run 5  Client –Submits the MapReduce job  Jobtracker –Coordinates the job run –Set through mapred.job.tracker property  Tasktrackers –Run the tasks that the job has been split into  Distributed file system (normally HDFS) –Used for sharing job files between the other entities

How Hadoop Runs a MapReduce Job 6  Step 1~4: Job submission  Step 5,6: Job intialization  Step 7: Task assignment  Step 8~10: Task execution

Job Submission 7  JobClient.runJob() –Creates a new JobClient instances 1 –Calls submitJob()  JobClient.submitJob() –Asks the jobtracker for a new job ID (by calling JobTracker.getNewJobId()) 2 –Checks the output specification of the job –Computes the input splits for the job –Copies the resources needed to run the job to the jobtracker’s filesystem 3  The job JAR file, the configuration file and the computed input splits –Tells the jobtracker that the job is ready for execution (by calling JobTracker.submitJob()) 4

Job Initialization 8  JobTracker.submitJob() –Creates a new JobInProgress instances 5  Represents the job being run  Encapsulates its tasks and status information –Puts it into an internal queue  The job scheduler will pick it up and initialize it from the queue  Job scheduler –Retrieves the input splits from the shared filesystem 6 –Creates one map task for each split –Creates reduce tasks to be run  The # of reduce tasks is determined by the mapred.reduce.tasks property –Gives IDs to the tasks

Task Assignment 9  Tasktrackers –Periodically send heartbeats to the Jobtracker 7  Also send whether they are ready to run a new task –Have a fixed number of slots for map/reduce tasks  Jobtracker –Chooses a job to select the task from –Assigns map/reduce tasks to tasktrackers using the hearbeat values  For map tasks, it takes account of the data locality

Task Execution 10  Tasktracker –Copies the job JAR from the shared filesystem 8 –Creates a local working directory for the task, and unjars the contents of the JAR into this directory –Creates an instance of TaskRunner to run the task  TaskRunner –Launches a new Java Virtual Machine(JVM) 9  So that any bugs in the user-defined map and reduce functions don’t affect the tasktracker –Runs each task in the JVM  Child process informs the parent of the task’s progress every few seconds

Task Execution – Streaming and Pipes 11  Run special map and reduce tasks to launch the user-supplied executable

Progress and Status Updates 12  It’s important for the user to get feedback on how the job is progressing –MapReduce jobs are long-running batch jobs  Progress – the proportion of the task completed –Map tasks – the proportion of the input that has been processed –Reduce tasks  The total progress is divided into three parts –Copy phase, sort phase, reduce phase  e.g., The task has run the reducer on half its input –A) the task’s progress = ⅚ –Since it has completed the copy and sort phases ( ⅓ each) and is half way through the reduce phase ( ⅙ )

Progress and Status Updates 13 Polling every second

Job Completion 14  Jobtracker changes the status for a job to “successful” when it is notified that the last task for the job is complete  JobClient learns it by polling for status –The client prints a message to tell the user, and returns from the runJob()  Cleanup –The jobtracker cleans up its working state for the job, and instructs tasktrackers to do the same  e.g., to delete intermediate output

Outline  Anatomy of a MapReduce Job Run  Failures  Job Scheduling  Shuffle and Sort  Task Execution 15

Failures 16  Task failure –When user code in the map or reduce task throws a runtime exception  Tasktracker failure –When a tasktracker fails by crashing, or running very slowly  Jobtracker failure –When the jobtracker fails by crashing

Task Failure 17  Child task failing –Child JVM reports the error back to its parent tasktracker  Sudden exit of the child JVM – The tasktracker notices that the process has exited  Hanging tasks –The tasktracker notices that it hasn’t received a progress update for a while –The child JVM process will be automatically killed after this period (mapred.task.timeout property)

Task Failure (cont’d) 18  Tasktracker –Notifying the jobtracker of the failure using heartbeat  Jobtracker –Task rescheduling  If a task fails less or equal than four times (by default)  mapred.map.max.attempts and mapred.reduce.max.attempts properties –Job failure  If any task fails more than four times (by default)  This value can be configured –mapred.max.map.failures.percent –mapred.max.reduce.failures.percent

Tasktracker Failure 19  Jobtracker –Notices a tasktracker that has stopped sending heartbeats  Heartbeat interval to expire: mapred.tasktracker.expiry.interval (default: 10 mins) –Removes it from its pool of tasktrackers to schedule tasks on –Arranges for map tasks that were run and completed successfully  Intermediate output residing on the failed tasktracker’s local filesystem may not be accessible  Blacklist –A tasktracker is blacklisted if its task failure rate is significantly higher than the average’s on the cluster –Blacklisted tasktrackers can be restarted to remove them from the blacklist

Jobtracker Failure 20  Currently, Hadoop has no mechanism for dealing with failure of the jobtracker  Jobtracker failure has a low chance of occurring –The chance of a particular machine failing is low  Future work –Running multiple jobtrackers, only one of which is the primary jobtracker at any time –Choosing the primary jobtracker using ZooKeeper as a coordination mechanism

Outline  Anatomy of a MapReduce Job Run  Failures  Job Scheduling  Shuffle and Sort  Task Execution 21

Job Scheduling 22  FIFO scheduler (default) –Queue-based –Job priority  mapred.job.priority property (VERY_HIGH, HIGH, NORMAL, LOW, VERY_LOW) –No preemption  Fair scheduler –Pool-based  Each user gets their own pool, where jobs are placed in  A user who submits more jobs will not get any more cluster resources –Preemption support –Configuration  mapred.jobtracker.taskScheduler = org.apache.hadoop.mapred.FairScheduler

Outline  Anatomy of a MapReduce Job Run  Failures  Job Scheduling  Shuffle and Sort  Task Execution 23

Shuffle and Sort 24

The Map Side 25  Buffering write –Circular memory buffer  Buffer size: io.sort.mb (default: 100MB) –A background thread spills the contents to disk when the contents of the buffer reaches a certain threshold  Threshold size: io.sort.spill.percent (default: 0.80 = 80%)  A new spill file is created each time  Partitioning and sorting –The background thread partitions the data corresponding to the reducers –The thread performs an in-memory sort by key, within each partition

The Reduce Side 26  Copy phase –Map tasks may finish at different times –Reduce task starts copying their outputs as soon as each completes  # of copier thread: mapred.reduce.parallel.copies (default: 5) –The map outputs also written using memory buffer  Buffer size: mapred.job.shuffle.input.buffer.percent  Threshold size: mapred.job.shuffle.merge.percent  Threshold # of map outputs: mapred.inmem.merge.threshold  Sort phase (merge phase) –Map outputs are merged in rounds  Merge factor: io.sort.factor (default: 10)

Outline  Anatomy of a MapReduce Job Run  Failures  Job Scheduling  Shuffle and Sort  Task Execution 27

Speculative Execution 28  Job execution time is sensitive to slow-running tasks –Only one straggling task can make the whole job take significantly longer  Speculative task –Another, equivalent, backup task –Launched only after all the tasks have been launched  When a task completes successfully, any duplicate tasks that are running are killed

Task JVM Reuse 29  To reduce the overhead of starting a new JVM for each task –Effective case  Jobs have a large number of very short-lived tasks (these are usually map tasks)  Jobs have lengthy initialization  If tasktrackers run more than one task at a time, this is always done in separate JVMs

Skipping Bad Records 30  Handling in mapper or reducer code –Ignoring bad records –Throwing an exception  Using Hadoop’s skipping mode –When you can’t handle them because there is a bug in a third party library –Skipping process 1.Task fails 2.Task fails 3.Skipping mode is enabled Task fails but failed record is stored by the tasktracker 4.Skipping mode is still enabled Task succeeds by skipping the bad record that failed in the previous attempt –Skipping mode can detect only one bad record per task attempt  This mechanism is appropriate only for detecting occasional bad records