MapReduce Dean and Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM, Vol. 51, No. 1, January 2008. Shahram.

Slides:



Advertisements
Similar presentations
Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)
Advertisements

MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
Mapreduce and Hadoop Introduce Mapreduce and Hadoop
Distributed Computations
CS 345A Data Mining MapReduce. Single-node architecture Memory Disk CPU Machine Learning, Statistics “Classical” Data Mining.
Map Reduce Allan Jefferson Armando Gonçalves Rocir Leite Filipe??
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
MapReduce Simplified Data Processing on Large Clusters Google, Inc. Presented by Prasad Raghavendra.
Distributed Computations MapReduce
Distributed MapReduce Team B Presented by: Christian Bryan Matthew Dailey Greg Opperman Nate Piper Brett Ponsler Samuel Song Alex Ostapenko Keilin Bickar.
L22: SC Report, Map Reduce November 23, Map Reduce What is MapReduce? Example computing environment How it works Fault Tolerance Debugging Performance.
MapReduce Simplified Data Processing On large Clusters Jeffery Dean and Sanjay Ghemawat.
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
MapReduce : Simplified Data Processing on Large Clusters Hongwei Wang & Sihuizi Jin & Yajing Zhang
MapReduce: Simplified Data Processing on Large Clusters
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.
Hadoop Ida Mele. Parallel programming Parallel programming is used to improve performance and efficiency In a parallel program, the processing is broken.
MapReduce.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
By: Jeffrey Dean & Sanjay Ghemawat Presented by: Warunika Ranaweera Supervised by: Dr. Nalin Ranasinghe.
MapReduce. Web data sets can be very large – Tens to hundreds of terabytes Cannot mine on a single server Standard architecture emerging: – Cluster of.
Google MapReduce Simplified Data Processing on Large Clusters Jeff Dean, Sanjay Ghemawat Google, Inc. Presented by Conroy Whitney 4 th year CS – Web Development.
Jeffrey D. Ullman Stanford University. 2 Chunking Replication Distribution on Racks.
Süleyman Fatih GİRİŞ CONTENT 1. Introduction 2. Programming Model 2.1 Example 2.2 More Examples 3. Implementation 3.1 ExecutionOverview 3.2.
Map Reduce and Hadoop S. Sudarshan, IIT Bombay
Take a Close Look at MapReduce Xuanhua Shi. Acknowledgement  Most of the slides are from Dr. Bing Chen,
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
Parallel Programming Models Basic question: what is the “right” way to write parallel programs –And deal with the complexity of finding parallelism, coarsening.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
MapReduce and Hadoop 1 Wu-Jun Li Department of Computer Science and Engineering Shanghai Jiao Tong University Lecture 2: MapReduce and Hadoop Mining Massive.
1 The Map-Reduce Framework Compiled by Mark Silberstein, using slides from Dan Weld’s class at U. Washington, Yaniv Carmeli and some other.
MapReduce – An overview Medha Atre (May 7, 2008) Dept of Computer Science Rensselaer Polytechnic Institute.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Map Reduce: Simplified Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat Google, Inc. OSDI ’04: 6 th Symposium on Operating Systems Design.
MAP REDUCE : SIMPLIFIED DATA PROCESSING ON LARGE CLUSTERS Presented by: Simarpreet Gill.
Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and.
MapReduce How to painlessly process terabytes of data.
Google’s MapReduce Connor Poske Florida State University.
MapReduce M/R slides adapted from those of Jeff Dean’s.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
CS 345A Data Mining MapReduce. Single-node architecture Memory Disk CPU Machine Learning, Statistics “Classical” Data Mining.
L22: Parallel Programming Language Features (Chapel and MapReduce) December 1, 2009.
Information Retrieval Lecture 9. Outline Map Reduce, cont. Index compression [Amazon Web Services]
SLIDE 1IS 240 – Spring 2013 MapReduce, HBase, and Hive University of California, Berkeley School of Information IS 257: Database Management.
SECTION 5: PERFORMANCE CHRIS ZINGRAF. OVERVIEW: This section measures the performance of MapReduce on two computations, Grep and Sort. These programs.
MapReduce and the New Software Stack CHAPTER 2 1.
By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.
Chapter 5 Ranking with Indexes 1. 2 More Indexing Techniques n Indexing techniques:  Inverted files - best choice for most applications  Suffix trees.
MapReduce: Simplified Data Processing on Large Clusters Lim JunSeok.
MapReduce : Simplified Data Processing on Large Clusters P 謝光昱 P 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.
C-Store: MapReduce Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 22, 2009.
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
MapReduce: simplified data processing on large clusters Jeffrey Dean and Sanjay Ghemawat.
MapReduce: Simplified Data Processing on Large Cluster Authors: Jeffrey Dean and Sanjay Ghemawat Presented by: Yang Liu, University of Michigan EECS 582.
MapReduce: Simplied Data Processing on Large Clusters Written By: Jeffrey Dean and Sanjay Ghemawat Presented By: Manoher Shatha & Naveen Kumar Ratkal.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
MapReduce: Simplified Data Processing on Large Clusters Jeff Dean, Sanjay Ghemawat Google, Inc.
Lecture 3 – MapReduce: Implementation CSE 490h – Introduction to Distributed Computing, Spring 2009 Except as otherwise noted, the content of this presentation.
Large-scale file systems and Map-Reduce
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn.
MapReduce Simplied Data Processing on Large Clusters
湖南大学-信息科学与工程学院-计算机与科学系
Cse 344 May 4th – Map/Reduce.
CS 345A Data Mining MapReduce This presentation has been altered.
Cloud Computing MapReduce, Batch Processing
5/7/2019 Map Reduce Map reduce.
COS 518: Distributed Systems Lecture 11 Mike Freedman
MapReduce: Simplified Data Processing on Large Clusters
Presentation transcript:

MapReduce Dean and Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM, Vol. 51, No. 1, January Shahram Ghandeharizadeh Computer Science Department University of Southern California

A Shared-Nothing Framework Shared-nothing architecture consisting of thousands of nodes! Shared-nothing architecture consisting of thousands of nodes!  A node is an off-the-shelf, commodity PC. Google File System Google’s Bigtable Data Model Google’s Map/Reduce Framework Yahoo’s Pig Latin …….

Overview: Map/Reduce (Hadoop) A programming model to make parallelism transparent to a programmer. A programming model to make parallelism transparent to a programmer.  Programmer specifies:  a map function that processes a key/value pair to generate a set of intermediate key/value pairs.  Divides the problem into smaller “intermediate key/value” sub-problems.  a reduce function to merge all intermediate values associated with the same intermediate key.  Solve each sub-problem.  Final results might be stored across R files.  Run-time system takes care of:  Partitioning the input data across nodes,  Scheduling the program’s execution,  Node failures,  Coordination among multiple nodes.

Example Counting word occurrences: Counting word occurrences:  Input document is NameList and its content is: “Jim Shahram Betty Jim Shahram Jim Shahram”  Desired output:  Jim: 3  Shahram: 3  Betty: 1 How? How? Map(String doc_name, String doc_content) // doc_name is document name, NameList //doc_content is document content, “Jim Shahram …” For each word w in value EmitIntermediate(w, “1”); Map (NameList, “Jim Shahram Betty …”) emits: [Jim, 1], [Shahram, 1], [Betty, 1] A hash function may split different tokens across M different “Worker” processes. Reduce (String key, Iterator values) // key is a word // values is a list of counts Int result = 0; For each v in values result += ParseInt(v); Emit(AsString(result)); Reduce (“Jim”, “1 1 1”) emits “3”

Other Examples Distributed Grep: Distributed Grep:  Map function emits a line if it matches a supplied pattern.  Reduce function is an identity function that copies the supplied intermediate data to the output. Count of URL accesses: Count of URL accesses:  Map function processes logs of web page requests and outputs,  Reduce function adds together all values for the same URL, emitting pairs. Reverse Web-Link graph; e.g., all URLs with reference to Reverse Web-Link graph; e.g., all URLs with reference to  Map function outputs for each link to a tgt in a page named src,  Reduce concatenates the list of all src URLS associated with a given tgt URL and emits the pair:. Inverted Index; e.g., all URLs with 585 as a word: Inverted Index; e.g., all URLs with 585 as a word:  Map function parses each document, emitting a sequence of,  Reduce accepts all pairs for a given word, sorts the corresponding doc_IDs and emits a pair.  Set of all output pairs forms a simple inverted index.

MapReduce Input: R = {r1, r2, …, rn}, user provided functions M and R Input: R = {r1, r2, …, rn}, user provided functions M and R  M(ri)  {[K1, V1], [K2, V2], … }  [Jim, 1], [Shahram, 1], [Betty, 1], … [Jim, “1 1 1”], [Shahram, “1 1 1”], [Betty, “1”]  R(Ki, ValueSet)  [Ki, R(ValueSet)]  [Jim, “3”], [Shahram, “3”], [Betty, “1”]

Implementation Target environment: Target environment:  Commodity PCs connected using a switched Ethernet.  GFS manages data stored across PCs.  A scheduling system accepts jobs submitted by users, each job consists of a set of tasks, and the scheduler maps tasks to a set of available machines within a cluster.

Execution Map invocations are distributed across multiple machines by automatically partitioning the input data into a set of M splits. Map invocations are distributed across multiple machines by automatically partitioning the input data into a set of M splits. Reduce invocations are distributed by paritioning the intermediate key space into R pieces using a hash function: hash(key) mod R. Reduce invocations are distributed by paritioning the intermediate key space into R pieces using a hash function: hash(key) mod R.  R and the partitioning function are specified by the programmer.

Output of Execution R output files, one per reduce task, with file name specified by the programmer. R output files, one per reduce task, with file name specified by the programmer. Typically, programmers do not combine R output files into one file – they pass these as input to another MapReduce call (or use them with another distributed application that is able to deal with input that is partitioned into multiple files). Typically, programmers do not combine R output files into one file – they pass these as input to another MapReduce call (or use them with another distributed application that is able to deal with input that is partitioned into multiple files).

Execution Important details: Important details:  Output of Map task is stored on the local disk of the machine the task is executing on.  A Map task produces R such files on its local disk – similar to your Homework 2 where R=101.  Output of Reduce task is stored in the GFS. High availability via replication.  The filename of the output produced by a reduce task is deterministic.  When a reduce task completes, the reduce worker atomically renames its temporary output file to the final output file.  If the same reduce task executes on multiple machines, multiple renames calls will be executed for the same output file.

Master Propagates location of intermediate file regions from map tasks to reduce tasks. For each completed map task, master stores the location and sizes of the R produced intermediate files. Propagates location of intermediate file regions from map tasks to reduce tasks. For each completed map task, master stores the location and sizes of the R produced intermediate files. Pushes location of the R produced intermediate files to the workers with in-progress reduce tasks. Pushes location of the R produced intermediate files to the workers with in-progress reduce tasks. For each map task and reduce task, master stores the possible states: idle, in-progress, or completed. For each map task and reduce task, master stores the possible states: idle, in-progress, or completed. Master takes the location of input files (GFS) and their replicas into account. It strives to schedule a map task on a machine that contains a replica of the corresponding input file (or near it). Master takes the location of input files (GFS) and their replicas into account. It strives to schedule a map task on a machine that contains a replica of the corresponding input file (or near it).  Minimize contention for the network bandwidth. Termination condition: All map and reduce tasks are in the “completed” state. Termination condition: All map and reduce tasks are in the “completed” state.

Worker Failures Failure detection mechanism: Master pings workers periodically. Failure detection mechanism: Master pings workers periodically. An in-progress Map or Reduce task on a failed worked is reset to idle and eligible for rescheduling. An in-progress Map or Reduce task on a failed worked is reset to idle and eligible for rescheduling. Completed Map task on a failed worker must also be re-executed because its output are stored on the local disk. Completed Map task on a failed worker must also be re-executed because its output are stored on the local disk.

Master Failure Abort the MapReduce computation. Abort the MapReduce computation. Client may check for this condition and retry the MapReduce operation. Client may check for this condition and retry the MapReduce operation. Alternative: lets the master checkpoint its data structures, enabling a new instance to resume from the last checkpoint state. Alternative: lets the master checkpoint its data structures, enabling a new instance to resume from the last checkpoint state.

Execution Important details: Important details:  Output of Map task is stored on the local disk of the machine the task is executing on.  A Map task produces R such files on its local disk – similar to your Homework 2 where R=101.  Output of Reduce task is stored in the GFS. High availability via replication.  The filename of the output produced by a reduce task is deterministic.  When a reduce task completes, the reduce worker atomically renames its temporary output file to the final output file.  If the same reduce task executes on multiple machines, multiple renames calls will be executed for the same output file.

Sequential versus Parallel Execution Is the results of a sequential execution the same as the parallel execution with failures? Is the results of a sequential execution the same as the parallel execution with failures?

Sequential versus Parallel Execution Is the results of a sequential execution the same as the parallel execution with failures? Is the results of a sequential execution the same as the parallel execution with failures?  Depends on the application. If Map and Reduce operators are deterministic functions of their input values: If Map and Reduce operators are deterministic functions of their input values:  When a map task completes, the worker sends a message to the master and includes the name of the R temporary files in the message.  If the master receives a completion message for an already completed map task, it ignores the message. Otherwise, it records the names of R files in a master data structure (for use by the reduce tasks).  Output of Reduce task is stored in the GFS. High availability via replication.  The filename of the output produced by a reduce task is deterministic.  When a reduce task completes, the reduce worker atomically renames its temporary output file to the final output file.  If the same reduce task executes on multiple machines, multiple renames calls will be executed for the same output file.

Sequential versus Parallel Execution Is the results of a sequential execution the same as the parallel execution with failures? Is the results of a sequential execution the same as the parallel execution with failures?  Depends on the application. If Map and Reduce operators are NOT deterministic functions of their input values: If Map and Reduce operators are NOT deterministic functions of their input values:

Load Balancing Values of M and R are much larger than the number of worker machines. Values of M and R are much larger than the number of worker machines.  When a worker fails, the many tasks assigned to it can be spread out across all the other workers. Master makes O(M+R) scheduling decisions and maintains O(MR) states in memory. Master makes O(M+R) scheduling decisions and maintains O(MR) states in memory. Practical guidelines: Practical guidelines:  M is chosen so that each task is roughly MB of input data,  R is a small multiple of the number of worker machines (R=5000 with 2000 worker machines).

Load Imbalance: Stagglers Stragglers are those tasks that take an unusually long time to complete one of the last few map or reduce tasks. Stragglers are those tasks that take an unusually long time to complete one of the last few map or reduce tasks.  Load imbalance is a possible reason. Master schedules backup executions of the remaining in-progress tasks when a MapReduce operation is close to completion. Master schedules backup executions of the remaining in-progress tasks when a MapReduce operation is close to completion. Task is marked as completed whenever either the primary or the backup execution completes. Task is marked as completed whenever either the primary or the backup execution completes. Significant improvement in execution time; 44% with sort. Significant improvement in execution time; 44% with sort.

Function Shipping Programmer may specify a “Combiner” function that does partial merging of data produced by a Map task. Programmer may specify a “Combiner” function that does partial merging of data produced by a Map task. The output of a combiner function is written to an intermediate file that is consumed by the reduce task. The output of a combiner function is written to an intermediate file that is consumed by the reduce task. Typically, the same code is used to implement both the combiner and the reduce functions. Typically, the same code is used to implement both the combiner and the reduce functions.  Example: With the word count example, there will be many instances with [“Jim”, 1] because “Jim” is more common than “Shahram”. Programmer writes a “Combiner” function to enable a Map task to produce [“Jim”, 55]. In order for this to work, the reduce function should be commutative and associative.

How to Debug? How does a programmer debug his or her MapReduce application? How does a programmer debug his or her MapReduce application?

How to Debug? How does a programmer debug his or her MapReduce application? How does a programmer debug his or her MapReduce application?  Alternative implementation of the MapReduce library that sequentially executes all of the work on the local machine.  Programmer may focus on a particular map tasks. What if the input data is causing failures? What if the input data is causing failures?

How to Debug? How does a programmer debug his or her MapReduce application? How does a programmer debug his or her MapReduce application?  Alternative implementation of the MapReduce library that sequentially executes all of the work on the local machine.  Programmer may focus on a particular map tasks. What if the input data is causing failures? What if the input data is causing failures?  Optional mode of execution where the MapReduce library detects which records cause deterministic crashes and skips these record.  Master knows about these records and the programmer may retrieve them for farther analysis.

Monitoring of MapReduce Very important to have eyes that can see: Very important to have eyes that can see:

Performance Numbers A cluster consisting of 1800 PCs: A cluster consisting of 1800 PCs:  2 GHz Intel Xeon processors  4 GB of memory  GB reserved for other tasks sharing the nodes.  320 GB storage: two 160 GB IDE disks Grep through 1 TB of data looking for a pre- specified pattern (M= MB, R=1): Grep through 1 TB of data looking for a pre- specified pattern (M= MB, R=1):  Execution time is 150 Seconds.

Performance Numbers A cluster consisting of 1800 PCs: A cluster consisting of 1800 PCs:  2 GHz Intel Xeon processors  4 GB of memory  GB reserved for other tasks sharing the nodes.  320 GB storage: two 160 GB IDE disks Grep through 1 TB of data looking for a pre- specified pattern (M= MB, R=1): Grep through 1 TB of data looking for a pre- specified pattern (M= MB, R=1):  Execution time is 150 Seconds workers are assigned! Time to schedule tasks; startup.

Startup with Grep Startup includes: Startup includes:  Propagation of the program to all worker machines,  Delays interacting with GFS to open the set of 1000 input files,  Information needed for the locality optimization.

Sort Map function extracts a 10-byte sorting key from a text line, emitting the key and the original text line as the intermediate key/value pair. Map function extracts a 10-byte sorting key from a text line, emitting the key and the original text line as the intermediate key/value pair.  Each intermediate key/value pair will be sorted. Identity function as the reduce operator. Identity function as the reduce operator.  R =  Partitioning information has built-in knowledge of the distribution of keys.  If this information is missing, add a pre-pass MapReduce to collect a sample of the keys and compute the partitioning information. Final sorted output is written to a set of 2- way replicated GFS files. Final sorted output is written to a set of 2- way replicated GFS files.

Sort Results