Distributed and Parallel Processing Technology Chapter6

Slides:



Advertisements
Similar presentations
Distributed and Parallel Processing Technology Chapter2. MapReduce
Advertisements

The map and reduce functions in MapReduce are easy to test in isolation, which is a consequence of their functional style. For known inputs, they produce.
MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO
MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
 Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware  Created by Doug Cutting and.
MapReduce.
Mapreduce and Hadoop Introduce Mapreduce and Hadoop
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
MapReduce Online Veli Hasanov Fatih University.
Developing a MapReduce Application – packet dissection.
O’Reilly – Hadoop: The Definitive Guide Ch.6 How MapReduce Works 16 July 2010 Taewhi Lee.
Spark: Cluster Computing with Working Sets
Hadoop: The Definitive Guide Chap. 2 MapReduce
 Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)
Distributed Computations
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 3: Processes.
CPS216: Advanced Database Systems (Data-intensive Computing Systems) How MapReduce Works (in Hadoop) Shivnath Babu.
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Hadoop, Hadoop, Hadoop!!! Jerome Mitchell Indiana University.
Hadoop: The Definitive Guide Chap. 8 MapReduce Features
Inter-process Communication in Hadoop
Hadoop & Cheetah. Key words Cluster  data center – Lots of machines thousands Node  a server in a data center – Commodity device fails very easily Slot.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.
Google MapReduce Simplified Data Processing on Large Clusters Jeff Dean, Sanjay Ghemawat Google, Inc. Presented by Conroy Whitney 4 th year CS – Web Development.
THE HOG LANGUAGE A scripting MapReduce language. Jason Halpern Testing/Validation Samuel Messing Project Manager Benjamin Rapaport System Architect Kurry.
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
1 The Map-Reduce Framework Compiled by Mark Silberstein, using slides from Dan Weld’s class at U. Washington, Yaniv Carmeli and some other.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
f ACT s  Data intensive applications with Petabytes of data  Web pages billion web pages x 20KB = 400+ terabytes  One computer can read
Cloud Distributed Computing Platform 2 Content of this lecture is primarily from the book “Hadoop, The Definite Guide 2/e)
大规模数据处理 / 云计算 Lecture 5 – Hadoop Runtime 彭波 北京大学信息科学技术学院 7/23/2013 This work is licensed under a Creative Commons.
CPS216: Advanced Database Systems (Data-intensive Computing Systems) Introduction to MapReduce and Hadoop Shivnath Babu.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 3: Operating-System Structures System Components Operating System Services.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
MC 2 : Map Concurrency Characterization for MapReduce on the Cloud Mohammad Hammoud and Majd Sakr 1.
CE Operating Systems Lecture 13 Linux/Unix interprocess communication.
O’Reilly – Hadoop: The Definitive Guide Ch.7 MapReduce Types and Formats 29 July 2010 Taikyoung Kim.
Presented by: Katie Woods and Jordan Howell. * Hadoop is a distributed computing platform written in Java. It incorporates features similar to those of.
SECTION 5: PERFORMANCE CHRIS ZINGRAF. OVERVIEW: This section measures the performance of MapReduce on two computations, Grep and Sort. These programs.
By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
MapReduce Joins Shalish.V.J. A Refresher on Joins A join is an operation that combines records from two or more data sets based on a field or set of fields,
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
MapReduce: Simplied Data Processing on Large Clusters Written By: Jeffrey Dean and Sanjay Ghemawat Presented By: Manoher Shatha & Naveen Kumar Ratkal.
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
CCD-410 Cloudera Certified Developer for Apache Hadoop (CCDH) Cloudera.
Lecture 3 – MapReduce: Implementation CSE 490h – Introduction to Distributed Computing, Spring 2009 Except as otherwise noted, the content of this presentation.
Chapter 10 Data Analytics for IoT
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
MapReduce Simplied Data Processing on Large Clusters
The Basics of Apache Hadoop
Cloud Distributed Computing Environment Hadoop
湖南大学-信息科学与工程学院-计算机与科学系
Hadoop Distributed Filesystem
Data processing with Hadoop
Lecture 16 (Intro to MapReduce and Hadoop)
MAPREDUCE TYPES, FORMATS AND FEATURES
COS 518: Distributed Systems Lecture 11 Mike Freedman
MapReduce: Simplified Data Processing on Large Clusters
Map Reduce, Types, Formats and Features
Presentation transcript:

Distributed and Parallel Processing Technology Chapter6 Distributed and Parallel Processing Technology Chapter6. How MapReduce works NamSoo Kim

Anatomy of a MepReduce Job Run You can run a MapReduce job with a single line of code: JobClient.runJob(conf). It’s very short, but it conceals a great deal of processing behind the scenes. This section uncovers the steps Hadoop takes to run a job. The whole process is illustrated in Figure 6-1. At the highest level, there are four independent entities: The client, which submits the MapReduce job. The jobtracker, which coordinates the job run. The jobtracker is a Java applicationwhose main class is JobTracker. The tasktrackers, which run the tasks that the job has been split into. Tasktrackers are Java applications whose main class is TaskTracker. The distributed filesystem (normally HDFS, covered in Chapter 3), which is used for sharing job files between the other entities.

Anatomy of a MepReduce Job Run Job Submission The runJob() method on JobClient is a convenience method that creates a newJobClient instance and calls submitJob() on it (step 1 in Figure 6-1). Asks the jobtracker for a new job ID(by calling getNewJobId() on JobTracker) (step2). Copies the resources needed to run the job, including the job JAR file, the configuration file and the computed input splits, to the jobtracker’s filesystem in a directory named after the job ID. The job JAR is copied with a high replication factor so that there are lots of copies across the cluster for the tasktrackers to access when they run tasks for the job (step 3). Tells the jobtracker that the job is ready for execution (by calling submitJob() on JobTracker) (step 4).

Anatomy of a MepReduce Job Run

Anatomy of a MepReduce Job Run Job Initialization When the JobTracker receives a call to its submitJob() method, it puts it into an internalqueue from where the job scheduler will pick it up and initialize it. Initialization involves creating an object to represent the job being run, which encapsulates its tasks, and bookkeeping information to keep track of the tasks’ status and progress (step 5). To create the list of tasks to run, the job scheduler first retrieves the input splits computed by the JobClient from the shared filesystem (step 6). Task Assignment Tasktrackers run a simple loop that periodically sends heartbeat method calls to the jobtracker. Heartbeats tell the jobtracker that a tasktracker is alive, but they also double as a channel for messages. As a part of the heartbeat, a tasktracker will indicate whether it is ready to run a new task, and if it is, the jobtracker will allocate it a task, which it communicates to the tasktracker using the heartbeat return value (step 7).

Anatomy of a MepReduce Job Run Task Execution First, it localizes the job JAR by copying it from the shared filesystem to the tasktracker’s filesystem. It also copies any files needed from the distributed cache by the application to the local disk; see “Distributed Cache” on page 239 (step 8). Second, it creates a local working directory for the task, and un-jars the contents of the JAR into this directory. Third, it creates an instance of TaskRunner to run the task. TaskRunner launches a new Java Virtual Machine (step 9) to run each task in (step 10) The child process communicates with its parent through the umbilical interface. Thisway it informs the parent of the task’s progress every few seconds until the task is complete.

Anatomy of a MepReduce Job Run Streaming and Pipes Both Streaming and Pipes run special map and reduce tasks for the purpose of launching the user-supplied executable and communicating with it (Figure 6-2).

Anatomy of a MepReduce Job Run Progress and Status Updates When a task is running, it keeps track of its progress, that is, the proportion of the task completed. For map tasks, this is the proportion of the input that has been processed. A job and each of its tasks have a status, which includes such things as the state of the job or task (e.g., running, successfully completed, failed), the progress of maps and reduces, the values of the job’s counters, and a status message or description What Constitutes Progress in MapReduce? Reading an input record (in a mapper or reducer) Writing an output record (in a mapper or reducer) Setting the status description on a reporter (using Reporter’s setStatus() method) Incrementing a counter (using Reporter’s incrCounter() method) Calling Reporter’s progress() method

Anatomy of a MepReduce Job Run The task status changes, report to the Task Tracker. Transmits the status of the task, the task tracker heartbeat. Job Tracker is the status of the task, combined with the global report to JobClient.

Anatomy of a MepReduce Job Run Job Completion When the jobtracker receives a notification that the last task for a job is complete, it changes the status for the job to “successful.” Then, when the JobClient polls for status, it learns that the job has completed successfully, so it prints a message to tell the user, and then returns from the runJob() method.

Failures Task Failure Tasktracker Failure Jobtracker Failure child task failing. The most common way that this happens is when user code in the map or reduce task throws a runtime exception. Streaming tasks failing. if the Streaming process exits with a nonzero exit code, it is marked as failed. This behavior is governed by the stream.non.zero.exit.is.failure property Hanging tasks failing. The tasktracker notices that it hasn’t received a progress update for a while, and proceeds to mark the task as failed. The child JVM process will be automatically killed after this period. Tasktracker Failure Failure of a tasktracker is another failure mode. If a tasktracker fails by crashing, or running very slowly, it will stop sending heartbeats to the jobtracker. Jobtracker Failure Failure of the jobtracker is the most serious failure mode. Currently, Hadoop has nomechanism for dealing with failure of the jobtracker.

Job Scheduling MapReduce in Hadoop now comes with a choice of schedulers. The default is the original FIFO queue-based scheduler, and there also a multi-user scheduler called the Fair Scheduler. The Fair Scheduler The Fair Scheduler aims to give every user a fair share of the cluster capacity over time. If a single job is running, it gets all of the cluster. The Fair Scheduler supports preemption, so if a pool has not received its fair share for a certain period of time, then the scheduler will kill tasks in pools running over capacity in order to give the slots to the pool running under capacity.

Shuffle and Sort MapReduce makes the guarantee that the input to every reducer is sorted by key. The process by which the system performs the sort— and transfers the map outputs to the reducers as inputs—is known as the shuffle. The Map Side Map function process is more involved, and takes advantage of buffering writes in memory and doing some presorting for efficiency reasons. Figure 6-4 shows what happens. Map the output will be written to the buffer, the thread partitioning and partition-specific sorting the data in the buffer. Each Map Task has a ring structure of the memory buffer and write data to the memory. Aligned the final output records that are recorded immediately after the spill file is written to disk, and merge them into a single output file, and when it reaches the buffer limit (default 80%) spill. Map output compression to reduce the amount of transmitted data can be performed.

Shuffle and Sort

Shuffle and Sort The Reduce Side The map tasks may finish at different times, so the reduce task starts copying their outputs as soon as each completes. How do reducers know which tasktrackers to fetch map output from? map tasks complete successfully -> they notify their parent tasktracker of the status update -> in turn notifies the jobtracker. -> A thread in the reducer periodically asks the jobtracker for map output locations -> Tasktrackers do not delete map outputs from disk as soon as the first reducer has retrieved them, as the reducer may fail. When all the map outputs have been copied, the reduce task moves into the sort phase, which merges the map outputs, maintaining their sort ordering. During the reduce phase the reduce function is invoked for each key in the sorted output. The output of this phase is written directly to the output filesystem, typically HDFS. In the case of HDFS, since the tasktracker node is also running a datanode, the first block replica will be written to the local disk.

Shuffle and Sort Configuration Tuning The general principle is to give the shuffle as much memory as possible. However, there is a trade-off, in that you need to make sure that your map and reduce functions get enough memory to operate. This is why it is best to write your map and reduce functions to use as little memory as possible On the map side, the best performance can be obtained by avoiding multiple spills to disk; one is optimal. On the reduce side, the best performance is obtained when the intermediate data can reside entirely in memory.

Shuffle and Sort

Shuffle and Sort

Task Execution Speculative Execution This makes job execution time sensitive to slow-running tasks, as it takes only one slow task to make the whole job take significantly longer than it would have done otherwise. Hadoop tries to detect when a task is running slower than expected and launches another, equivalent, task as a backup. This is termed speculative execution of tasks. It’s important to understand that speculative execution does not work by launching two duplicate tasks at about the same time so they can race each other. if the original task completes before the speculative task then the speculative task is killed; on the other hand, if the speculative task finishes first, then the original is killed. Speculative execution is an optimization, not a feature to make jobs run more reliably.

Task Execution Task JVM Reuse Hadoop runs tasks in their own Java Virtual Machine to isolate them from other running tasks. The overhead of starting a new JVM for each task can take around a second, which for jobs that run for a minute or so is insignificant. However, jobs that have a large number of very short-lived tasks (these are usually map tasks) or that have lengthy initialization, can see performance gains when the JVM is reused for subsequent tasks. The property for controlling task JVM reuse is mapred.job.reuse.jvm.num.tasks : it specifies the maximum number of tasks to run for a given job for each JVM launched,the default is 1.

Task Execution Skipping Bad Records The Task Execution Environment The best way to handle corrupt records is in your mapper or reducer code. You can detect the bad record and ignore it, or you can abort the job by throwing an exception. You can also count the total number of bad records in the job using counters to see how widespread the problem is. Skipping mode is off by default; you enable it independently for map and reduce tasks using the SkipBadRecords class. The Task Execution Environment Hadoop provides information to a map or reduce task about the environment in which it is running. (The properties in Table 6-5 can be accessed from the job’s configuration)

Task Execution Hadoop provides a mechanism for application writers to use this feature A task may find its working directory by retrieving the value of the mapred. work. output. dir property from its configuration file. a MapReduce program using the Java API may call the getWorkOutputPath() static method on FileOutputFormat to get the Path object representing the working directory.