2 Outline MapReduce overview Note: These notes are based on notes provided by Google
3 What is a Cloud?Cloud = Lots of storage + compute cycles nearby
4 Data-Intensive Computing Typically store data at datacentersUse compute nodes nearbyCompute nodes run computation servicesIn data-intensive computing, the focus is on the data: problem areas includeStorageCommunication bottleneckMoving tasks to data (rather than vice-versa)SecurityAvailability of DataScalability
6 Motivation: Large Scale Data Processing Want to process lots of data ( > 1 TB)Want to parallelize across hundreds/thousands of CPUsHow to parallelizeHow to distributeHow to handle failuresWant to make this easy
7 What is MapReduce?MapReduce is an abstraction that allows programmers to specify computations that can be done in parallelMapReduce hides the messy details needed to support the computations e.g.,Distribution and synchronizationMachine failuresData distributionLoad balancingThis is widely used at Google
8 Programming ModelMapReduce simplifies programming through its library.The user of the MapReduce library expresses the computation as two functions: Map, Reduce
9 Programming ModelMapTakes an input pair and produces a set of intermediate key/value pairs e.g.,Map: (key1, value1) list(key2,value2)The MapReduce library groups together all intermediate values associated with the same intermediate keyReduceThis function accepts an intermediate key and a set of values for that keyReduce: (key2,list(key2,value2)) value3
10 Example: Word Frequencies in Web Pages Determine the count of each word that appears in a document (or a set of documents)Each file is associated with a document URLMap functionKey = document URLValue = document contentsOutput of map function is (potentially many) key/value pairsOutput (word, “1”) once per word in the document
11 Example: Word Frequencies in Web Pages Pseudo code for mapMap(String key, String value):// input_key: document name// input_value: document contentsfor each word w in value:EmitIntermediate(w, "1");
12 Example: Word Frequencies in Web Pages Example key, value pair:“document_example”, “to be or not to be”Result of applying the map function“to”, 1“be”, 1“or”, 1“not”, 1
13 Example: Word Frequencies in Web Pages Pseudo-code for ReduceReduce(String key, values):// key: a word, same for input and output// values: a list of countsint result = 0;for each v in values:result = result + value;Emit(result);The function sums together all counts emitted for a particular word
14 Example: Word Frequencies in Web Pages The MapReduce framework sorts all pairs with the same key(be,1), (be,1), (not,1), (or, 1), (to, 1), (to,1)The pairs are then grouped(be, 1,1), (not, 1), (or, 1), (to, 1, 1)The reduce function combines (sums) the values for a keyExample: Applying reduce to (be, 1, 1) = 2
15 Example: Distributed Grep Find all occurrences of a given pattern in a a file (or set of files)Input consists of (url+offset, line)map(key=url+offset, val=line):If contents match specified pattern, emit (line, “1)reduce(key=line, values=uniq_counts):Example of input to reduce is essentially (line, [1,1,1,1])Don’t do anything; just emit line
16 Example: Count of URL Access Frequency Map functionInput: <log of web page requests, content of log>Outputs: <URL, 1>Reduce function adds together all values for the same URL
17 Example:Web structure Simple representation of WWW link graphMapInput: (URL, page-contents)Output: (URL, list-of-URLs)Who maps to me?Input: (URL, list-of-URLS)Output: For each u in list-of-URLS output <u,URL>Reduce: Concatenates the list of all source URLs associated with u and emits (<u, list(URL))
18 The InfrastructureLarge clusters of commodity PCs and networking hardwareClusters consists of 100/1000s of machines (failures are common)GFS (Google File System).Distributed file system.Provides replication of the data.
19 The Infrastructure Users submit jobs to a scheduling system Possible partitions of data can be based on files, databases, file lines, database records etc;
20 ExecutionMap invocations are distributed across multiple machines by automatically partitioning the input data into a set of M splits.The input splits can be processed in parallel by different machinesReduce invocations are distributed by partitioning the intermediate key space into R pieces using a hash function: hash(key) mod RR and the partitioning function are specified by the programmer.
21 Execution Workers are assigned work by the master The master is started by the MapReduce Framework
22 ExecutionWorkers assigned map tasks read the input, parse it and invokethe user’s Map() method.
23 Execution Intermediate key/value pairs are buffered in memory Periodically, buffered data is written to local disk (R files)Pseudo random partitioning function (e.g., (hash(k) mod R)
24 Execution Locations are passed back to the master who forwards these locations to workers executing the reduce function.
25 Execution Reduce runs after all mappers are done Workers executing Reduce are notified by the master aboutlocation of intermediate data
26 ExecutionReduce workers use remote procedure calls to read the data fromlocal disks of map worksSorts all intermediate data by intermediate key
27 ExecutionReduce worker iterates over the sorted intermediate data and foreach key encountered it passes the key and the corresponding setof intermediate values to the Reduce function
28 ExecutionThe output of the Reduce function is appended to a final outputfile
29 Data flow Input, final output are stored on a distributed file system Scheduler tries to schedule map tasks “close” to physical storage location of input dataIntermediate results are stored on local file system of map and reduce workersOutput can be input to another map reduce task
32 Coordination Master data structures Task status: (idle, in-progress, completed)Idle tasks get scheduled as workers become availableWhen a map task completes, it sends the master the location and sizes of its R intermediate files, one for each reducerMaster pushes this info to reducersMaster pings workers periodically to detect failures
33 Failures Map worker failure Reduce worker failure Master failure Map tasks completed or in-progress at worker are reset to idleReduce workers are notified when task is rescheduled on another workerReduce worker failureOnly in-progress tasks are reset to idleMaster failureMapReduce task is aborted and client is notified
34 LocalityMapReduce master takes the location information of input files into account and attempts to schedule a map task on a machine that contains a replica of the corresponding input dataSchedule a map task near a replica of that task’s input dataThe goal is to read most input data locally and thus reduce the consumption of network bandwidth
35 Task GranularityM and R should be much larger than the number of available machines.Dynamic load balancing.Speeds up recovery in case of failures.R determines the number of output filesOften constrained by users.
36 Backup Tasks Stragglers - A common reason for long computations. Schedule backups for remaining jobs (in progress jobs) when map or reduce phases near completion.Slightly increases needed computational resources.Does not increase running time, but has the potential to improve it significantly.
37 CombinersOften a map task will produce many pairs of the form (k,v1), (k,v2), … for the same key kE.g., popular words in Word CountCan save network time by pre-aggregating at mappercombine(k1, list(v1)) v2Usually same as reduce functionWorks only if reduce function is commutative and associative
38 Partition FunctionInputs to map tasks are created by contiguous splits of input fileFor reduce, we need to ensure that records with the same intermediate key end up at the same workerSystem uses a default partition function e.g., hash(key) mod RSometimes useful to override; What if all output keys are URLS and we want all entries for a single host to end up in the same output file?Use hash(hostname(URL)) mod R ensures URLs from a host end up in the same output file
39 Summary MapReduce – a framework for distributed computing. Distributed programs are easy to write and understand.Provides fault toleranceProgram execution can be easily monitored.It works for Google!!