MapReduce: Simplified Data Processing on Large Clusters

Slides:



Advertisements
Similar presentations
Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)
Advertisements

MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
Distributed Computations
MapReduce: Simplified Data Processing on Large Clusters Cloud Computing Seminar SEECS, NUST By Dr. Zahid Anwar.
CS 345A Data Mining MapReduce. Single-node architecture Memory Disk CPU Machine Learning, Statistics “Classical” Data Mining.
MapReduce Dean and Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM, Vol. 51, No. 1, January Shahram.
MapReduce: Simplified Data Processing on Large Clusters J. Dean and S. Ghemawat (Google) OSDI 2004 Shimin Chen DISC Reading Group.
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
MapReduce Simplified Data Processing on Large Clusters Google, Inc. Presented by Prasad Raghavendra.
Distributed Computations MapReduce
Distributed MapReduce Team B Presented by: Christian Bryan Matthew Dailey Greg Opperman Nate Piper Brett Ponsler Samuel Song Alex Ostapenko Keilin Bickar.
L22: SC Report, Map Reduce November 23, Map Reduce What is MapReduce? Example computing environment How it works Fault Tolerance Debugging Performance.
MapReduce Simplified Data Processing On large Clusters Jeffery Dean and Sanjay Ghemawat.
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
MapReduce : Simplified Data Processing on Large Clusters Hongwei Wang & Sihuizi Jin & Yajing Zhang
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science MapReduce:
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.
Hadoop Ida Mele. Parallel programming Parallel programming is used to improve performance and efficiency In a parallel program, the processing is broken.
MapReduce.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
By: Jeffrey Dean & Sanjay Ghemawat Presented by: Warunika Ranaweera Supervised by: Dr. Nalin Ranasinghe.
MapReduce. Web data sets can be very large – Tens to hundreds of terabytes Cannot mine on a single server Standard architecture emerging: – Cluster of.
Google MapReduce Simplified Data Processing on Large Clusters Jeff Dean, Sanjay Ghemawat Google, Inc. Presented by Conroy Whitney 4 th year CS – Web Development.
MapReduce: Simplified Data Processing on Large Clusters 컴퓨터학과 김정수.
Map Reduce: Simplified Data Processing On Large Clusters Jeffery Dean and Sanjay Ghemawat (Google Inc.) OSDI 2004 (Operating Systems Design and Implementation)
Introduction to MapReduce Amit K Singh. “The density of transistors on a chip doubles every 18 months, for the same cost” (1965) Do you recognize this.
Jeffrey D. Ullman Stanford University. 2 Chunking Replication Distribution on Racks.
Süleyman Fatih GİRİŞ CONTENT 1. Introduction 2. Programming Model 2.1 Example 2.2 More Examples 3. Implementation 3.1 ExecutionOverview 3.2.
Map Reduce and Hadoop S. Sudarshan, IIT Bombay
Take a Close Look at MapReduce Xuanhua Shi. Acknowledgement  Most of the slides are from Dr. Bing Chen,
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
Parallel Programming Models Basic question: what is the “right” way to write parallel programs –And deal with the complexity of finding parallelism, coarsening.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
MapReduce and Hadoop 1 Wu-Jun Li Department of Computer Science and Engineering Shanghai Jiao Tong University Lecture 2: MapReduce and Hadoop Mining Massive.
1 The Map-Reduce Framework Compiled by Mark Silberstein, using slides from Dan Weld’s class at U. Washington, Yaniv Carmeli and some other.
MapReduce – An overview Medha Atre (May 7, 2008) Dept of Computer Science Rensselaer Polytechnic Institute.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Map Reduce: Simplified Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat Google, Inc. OSDI ’04: 6 th Symposium on Operating Systems Design.
MAP REDUCE : SIMPLIFIED DATA PROCESSING ON LARGE CLUSTERS Presented by: Simarpreet Gill.
MapReduce: Simplified Data Processing on Large Clusters
MapReduce How to painlessly process terabytes of data.
Google’s MapReduce Connor Poske Florida State University.
MapReduce M/R slides adapted from those of Jeff Dean’s.
Mass Data Processing Technology on Large Scale Clusters Summer, 2007, Tsinghua University All course material (slides, labs, etc) is licensed under the.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
CS 345A Data Mining MapReduce. Single-node architecture Memory Disk CPU Machine Learning, Statistics “Classical” Data Mining.
MAPREDUCE PRESENTED BY: KATIE WOODS & JORDAN HOWELL.
SLIDE 1IS 240 – Spring 2013 MapReduce, HBase, and Hive University of California, Berkeley School of Information IS 257: Database Management.
SECTION 5: PERFORMANCE CHRIS ZINGRAF. OVERVIEW: This section measures the performance of MapReduce on two computations, Grep and Sort. These programs.
By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.
MapReduce: Simplified Data Processing on Large Clusters Lim JunSeok.
MapReduce : Simplified Data Processing on Large Clusters P 謝光昱 P 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.
C-Store: MapReduce Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 22, 2009.
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
MapReduce: simplified data processing on large clusters Jeffrey Dean and Sanjay Ghemawat.
MapReduce: Simplified Data Processing on Large Cluster Authors: Jeffrey Dean and Sanjay Ghemawat Presented by: Yang Liu, University of Michigan EECS 582.
Dr Zahoor Tanoli COMSATS.  Certainly not suitable to process huge volumes of scalable data  Creates too much of a bottleneck.
MapReduce: Simplied Data Processing on Large Clusters Written By: Jeffrey Dean and Sanjay Ghemawat Presented By: Manoher Shatha & Naveen Kumar Ratkal.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
MapReduce: Simplified Data Processing on Large Clusters Jeff Dean, Sanjay Ghemawat Google, Inc.
Lecture 3 – MapReduce: Implementation CSE 490h – Introduction to Distributed Computing, Spring 2009 Except as otherwise noted, the content of this presentation.
MapReduce: Simplified Data Processing on Large Clusters
MapReduce Simplied Data Processing on Large Clusters
Map reduce use case Giuseppe Andronico INFN Sez. CT & Consorzio COMETA
Cse 344 May 4th – Map/Reduce.
Map-Reduce framework -By Jagadish Rouniyar.
Cloud Computing MapReduce, Batch Processing
MapReduce: Simplified Data Processing on Large Clusters
Presentation transcript:

MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean & Sanjay Ghemawat Greeting Myself Today’s topic Authors Conference MapReduce aims to provide a simplified approach to program and run distributed applications Analogy Presented by: Hemanth Makkapati (makka@vt.edu) Appeared in: OSDI '04: Sixth Symposium on Operating System Design and Implementation, San Francisco, CA, December, 2004.

An apple a day keeps a doctor away! Mom told Sam An apple a day keeps a doctor away!

One day Sam thought of “drinking” the apple So, he used a to cut the and a to make juice. Compared to eating apple, drinking juice is fun

Next Day Sam applied his invention to all the fruits he could find in the fruit basket Excited by his success at making apple juice, Sam now wants to make juice from multiple fruits He excelled at this for sometime and then Simple!!!

18 Years Later Sam got his first job with juice making giants, Joogle, for his talent in making juice But! Now, it’s not just one basket but a whole container of fruits Fruits Simple??? Also, he has to make juice of different fruits separately Like with any giant, be it search giants or juice making giants, you are going to have huge numbers at hand Like search, recommendations, usage analysis, page rank etc? Is this representative of computations that process large-scale data? And, Sam has just ONE and ONE

Why? The operations themselves are conceptually simple Making juice Indexing Recommendations etc But, the data to process is HUGE!!! Google processes over 20 PB of data every day Sequential execution just won’t scale up

Why? Parallel execution achieves greater efficiency But, parallel programming is hard Parallelization Race Conditions Debugging Fault Tolerance Data Distribution Load Balancing This complicates the computation that we want to perform! So, MapReduce proposes a way to separate computation from managing distributed-ness

MapReduce “MapReduce is a programming model and an associated implementation for processing and generating large data sets” Programming model Abstractions to express simple computations Library Takes care of the gory stuff: Parallelization, Fault Tolerance, Data Distribution and Load Balancing Managing of Parallelization, Fault Tolerance, Data Distribution and Load Balancing is invisible to the user and comes free Motivation: primarily meant for Google’s web indexing Once the documents are crawled Several mapreduce operations are run in sequence to index documents

Programming Model To generate a set of output key-value pairs from a set of input key-value pairs { < ki, vi >}  { < ko, vo >} Expressed using two abstractions: Map task <ki, vi>  { < kint, vint > } Reduce task < kint, {vint} >  < ko, vo > Library aggregates all the all intermediate values associated with the same intermediate key passes the intermediate key-value pairs to reduce function Abstract: Take something ustructured and derive meaningful data out of it

Inspiration ‘map’ & ‘reduce/fold’ of functional programming languages (map f list [list2 list3 …]) (map square (1 2 3 4))  (1 4 9 16) (reduce f list […]) (reduce + (1 4 9 16))  30 (reduce + (map square (map – l1 l2)))) Functional languages: logic is expressed as expressions No mutable data and state. computed values cannot be dependent on previously computed values. ( X -> Y -> Z ) Order does not matter

Example Map Reduce

Example: Word Count map(String input_key, String input_value): // input_key: document name // input_value: document contents for each word w in input_value: EmitIntermediate(w, "1");   reduce(String output_key, Iterator intermediate_values): // output_key: a word // output_values: a list of counts int result = 0; for each v in intermediate_values: result += ParseInt(v); Emit(AsString(result)); <“Sam”, “1”>, <“Apple”, “1”>, <“Sam”, “1”>, <“Mom”, “1”>, <“Sam”, “1”>, <“Mom”, “1”>, <“Sam” , [“1”,”1”,”1”]>, <“Apple” , [“1”]>, <“Mom” , [“1”, “1”]> “3” “1” “2”

Implementation Large cluster of commodity PCs Scheduling system Connected together with switched Ethernet X86 dual-processor, 2-4 GB memory each Linux OS Storage by GFS on IDE disks Scheduling system Users submit jobs Tasks are scheduled to available machines on cluster

Google File System (GFS) File is divided into several chunks of predetermined size Typically, 16-64 MB Replicates each chunk by a predetermined factor Usually, three replicas Replication to achieve fault-tolerance Availability Reliability

GFS Architecture

Execution User specifies Map Phase Reduce Phase M: no. of map tasks R: no. of reduce tasks Map Phase input is partitioned into M splits map tasks are distributed across multiple machines Reduce Phase reduce tasks are distributed across multiple machines intermediate keys are partitioned (using partitioning function) to be processed by desired reduce task

Execution Flow 1. The MapReduce library in the user program first splits the input les into M pieces of typically 16 megabytes to 64 megabytes (MB) per piece (controllable by the user via an optional parameter). It then starts up many copies of the program on a cluster of machines. 2. One of the copies of the program is special . the master. The rest are workers that are assigned work by the master. There are M map tasks and R reduce tasks to assign. The master picks idle workers and assigns each one a map task or a reduce task. 3. A worker who is assigned a map task reads the contents of the corresponding input split. It parses key/value pairs out of the input data and passes each pair to the user-defined Map function. The intermediate key/value pairs produced by the Map function are buffered in memory. 4. Periodically, the buffered pairs are written to local disk, partitioned into R regions by the partitioning function. The locations of these buffered pairs on the local disk are passed back to the master, who is responsible for forwarding these locations to the reduce workers. 5. When a reduce worker is notified by the master about these locations, it uses remote procedure calls to read the buffered data from the local disks of the map workers. When a reduce worker has read all intermediate data, it sorts it by the intermediate keys so that all occurrences of the same key are grouped together. The sorting is needed because typically many different keys map to the same reduce task. If the amount of intermediate data is too large to t in memory, an external sort is used. 6. The reduce worker iterates over the sorted intermediate data and for each unique intermediate key encountered, it passes the key and the corresponding set of intermediate values to the user's Reduce function. The output of the Reduce function is appended to a final output file for this reduce partition. 7. When all map tasks and reduce tasks have been completed, the master wakes up the user program. At this point, the MapReduce call in the user program returns back to the user code.

Sam & MapReduce Sam implemented a parallel version of his innovation Fruits

Master Data Structures For each task State { idle, in-progress, completed } Identity of the worker machine For each completed map task Size and location of intermediate data

Fault Tolerance Worker failure – handled via re-execution Identified by no response to heartbeat messages In-progress and Completed map tasks are re-scheduled Workers executing reduce tasks are notified of re-scheduling Completed reduce tasks are not re-scheduled Master failure Rare Can be recovered from checkpoints All tasks abort Sometimes a secondary Master is employed

Disk Locality Leveraging GFS Map tasks are scheduled close to data on nodes that have input data if not, on nodes that are nearer to input data Ex. Same network switch Conserves network bandwidth

Task Granularity No. of map tasks > no. of worker nodes Better load balancing Better recovery But, increases load on Master More scheduling decisions More states to be saved M could be chosen w.r.t to block size of the file system to effectively leverage locality R is usually specified by users Each reduce task produces one output file

Stragglers Slow workers delay completion time Bad disks with soft errors Other tasks eating up resources Strange reasons like processor cache being disabled Start back-up tasks as the job nears completion First task to complete is considered

Refinement: Partitioning Function Identifies the desired reduce task Given the intermediate key and R Default partitioning function hash(key) mod R Important to choose well-balanced partitioning functions If not, reduce tasks may delay completion time

Refinement: Combiner Function Mini-reduce phase before the intermediate data is sent to reduce Significant repetition of intermediate keys possible Merge values of intermediate keys before sending to reduce tasks Similar to reduce function Saves network bandwidth

Refinement: Skipping Bad Records Map/Reduce tasks may fail on certain records due to bugs Ideally, debug and fix Not possible if third-party code is buggy When worker dies, Master is notified of the record If more than one worker dies on the same record Master re-schedules the task and asks to skip the record

Refinements: others Ordering guarantees Temporary files sorted output file Temporary files Local sequential execution to debug and test Status Information input, intermediate & output bytes processed so far error & failure reports Counters to keep track of specific events

Performance Evaluated on two programs running on a large cluster & processing 1 TB data grep & sort Cluster Configuration 1800 machines 2 GHz Intel Xeon processors 4 GB memory two 160 GB IDE disks gigabit Ethernet link Hosted in the same facility

Grep Scans for a three character pattern M = 15000 @ 64 MB per split Entire computation takes 150s Startup overhead apprx. 60s Propagation of programs to worker machines GFS operations Information of locality optimizations

Sort Models TeraSort benchmark M = 15000 @ 64 MB per split R = 4000 Workers = 1700 Evaluated on three executions with backup tasks without backup tasks with machine failures

Sort Normal execution Without backup tasks With machine failures

Experience: Google Indexing System Complete re-write using MapReduce Modeled as a sequence of multiple MapReduce operations Much simpler code fewer LoC shorter code changes easier to improve performance

Conclusion Easy to use scalable programming model for large-scale data processing on clusters Allows users to focus on computations Hides issues of parallelization, fault tolerance, data partitioning & load balancing Achieves efficiency through disk-locality Achieves fault-tolerance through replication

New Trend: Disk-locality Irrelevant Assumes disk bandwidth exceeds network bandwidth Network speeds fast improving Disk speeds have stagnated Next step: attain memory-locality

Hadoop An open-source framework for data-intensive distributed applications Inspired from Google’s MapReduce & GFS Implemented in Java Name after the toy elephant of creator’s son

Hadoop History Nutch, an effort to develop open-souce search engine Soon, encountered scalability issues to store and process large data sets Google published MapReduce and GFS Attempted re-creation for Nutch Yahoo! got interested and contributed

Hadoop Ecosystem Hadoop MapReduce Hadoop Distributed File System(HDFS) inspired by Google’s MapReduce Hadoop Distributed File System(HDFS) inspired by Google File System HBase – Hadoop Database inspired by Google BigTable Cassandra – Distributed Key-Value Store Inspired by Amazon Dynamo & Google BigTable

References Google File System Google BigTable Apache Hadoop http://labs.google.com/papers/gfs.html Google BigTable http://labs.google.com/papers/bigtable.html Apache Hadoop http://hadoop.apache.org/ Juice Example http://esaliya.blogspot.com/2010/03/mapreduce-explained-simply-as-story-of.html

Thank you