Presentation on theme: "MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat."— Presentation transcript:
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat
Agenda Introduction Implementation Overview Google File System Hadoop Implementation Demo Conclusion
MapReduce Origin from Google in 2003 MapReduce is a programming framework. Programmers program map and reduce functions specific to their task. With MapReduce, these functions are automatically parallelized to utilize Google’s clusters of commodity machines. Allows programmers with little experience in parallelization and clusters to quickly accomplish computationally intensive tasks that have insanely huge input sets.
What are Map Reduce functions? A Map function evaluates a set of key/value pairs to generate an intermediate set of key/value pairs. Ex: Count the occurrences of each word in a large number of documents. The input key/value pair is the document name and the document content. The intermediate result set is (, 1).
What are Map Reduce functions? A Reduce function merges the intermediate set of key/value pairs to a more concise key/value pair set in which each key is unique. Ex: For the example of counting the occurrences of each word in a large number of documents, the key/value result set would be (, )
MapReduce Example Counting words in a large set of documents map(string value) //key: document name //value: document contents for each word w in value EmitIntermediate(w, “1”); reduce(string key, iterator values) //key: word //values: list of counts int results = 0; for each v in values result += ParseInt(v); Emit(AsString(result)); map (k1,v1) ! list(k2,v2) reduce (k2,list(v2)) ! list(v2)
More Example Count of URL Access Frequency: –The map function processes logs of web page requests and outputs for each request. –The reduce function adds the total number of requests for each URL and outputs
Implementation Computers have dual processors x86 processors running Linux, with 2-4 GB of memory per machine. Commodity networking hardware A cluster consists of hundreds or even thousands of machines, so machine failure is common Storage consists of inexpensive IDE disks connected to the computers. Each job is submitted to the scheduler and consists of a series of tasks.
Implementation The input data is automatically split into M splits. These splits can be accessed in parallel by different machines. The size of a split is user-specified. The output of the Map functions are partitioned into R pieces by a partitioning function. Both R and the partitioning function are specified by the user.
Implementation When the user program calls the MapReduce function, the MapReduce library first splits the input. Then, it starts up many copies of the user program on many different machines in clusters. One of these copies is special. It is the master program, which assigns work to the rest of the program copies, called workers. There are M map tasks and R reduce tasks to assign. The master picks idle workers and assigns each one a M or R task.
Implementation A worker who is assigned a Map task parses the corresponding input split and passes each key/value pair to the Map function The intermediate results produced by the Map function are buffered in local memory. Periodically, the intermediate results are split into R partitions and are written to local disk. The addresses of these results are forwarded to the master, who forwards them to the reduce workers.
Implementation When a reduce worker is notified about the locations of the intermediate results, it uses remote procedure calls to read the result from the local disk of the map worker. Once the reduce worker has read all the intermediate data, it sorts them so that the pairs with the same key are grouped together. If the amount of intermediate data is too large, an external sort (a sort used for very large input sizes) is used.
Implementation The reduce worker iterates over the sorted intermediate data and for every unique intermediate key, it passes the key and it’s corresponding set of values to the reduce function. The output of this function is appended to a final output file. There is a different output file for every reduce partition.
Implementation Once all the map and reduce tasks are finished, the master wakes up the user program. Then, the MapReduce call returns back to user code.
Google File System Goal – global view – make huge files available in the face of node failures Master Node (meta server) – Centralized, index all chunks on data servers Chunk server (data server) – File is split into contiguous chunks, typically 16-64MB. – Each chunk replicated (usually 2x or 3x). Try to keep replicas in different racks.
Fault Tolerance Master pings workers periodically Any machine who does not respond is considered “ dead ” Both Map- and Reduce-Machines – Any task in progress gets needs to be re-executed and becomes eligible for scheduling Map-Machines – Completed tasks are also reset because results are stored on local disk – Reduce-Machines notified to get data from new machine assigned to assume task
Skipping Bad Records Bugs in user code (from unexpected data) cause deterministic crashes – Optimally, fix and re-run – Not possible with third-party code When worker dies, sends “ last gasp ” UDP packet to Master describing record If more than one worker dies over a specific record, Master issues yet another re-execute command Tells new worker to skip problem record
Hadoop Demo Hadoop WordCount Implementation in Java
Conclusion Provide a general-purpose model to simplify large-scale computation Allow users to focus on the problem without worrying about details