Presentation is loading. Please wait.

Presentation is loading. Please wait.

MapReduce: Simplified Data Processing on Large Cluster Authors: Jeffrey Dean and Sanjay Ghemawat Presented by: Yang Liu, University of Michigan EECS 582.

Similar presentations


Presentation on theme: "MapReduce: Simplified Data Processing on Large Cluster Authors: Jeffrey Dean and Sanjay Ghemawat Presented by: Yang Liu, University of Michigan EECS 582."— Presentation transcript:

1 MapReduce: Simplified Data Processing on Large Cluster Authors: Jeffrey Dean and Sanjay Ghemawat Presented by: Yang Liu, University of Michigan EECS 582 – W161

2 About the Authors EECS 582 – W162 Jeff DeanSanjay Ghemawat

3 Motivation Challenge at google Input data too large -> distributed computing Most computations are straightforward(log processing, inverted index) -> boring work Complexity of distributed computing Machine failure Scheduling EECS 582 – W163

4 Solution: MapReduce MapReduce as the distributed programing infrastructure Simple Programming interface: Map + Reduce Distributed implementation that hides all the messy details Fault tolerance I/O scheduling parallelization EECS 582 – W164

5 Programming Model EECS 582 – W165 Inspired by map and reduce functions in Lisp and other functional programing languages Lisp: Map #‘length’ (() (a) (ab) (abc)) Reduce #‘+’ (0 1 2 3) 0 1 2 3 6

6 Programing Model Programmer only need to specify two functions: Map Function map (in_key, in_value) -> list(out_key, intermediate_value) Process input key/value pair Produce set of output key/intermediate value pairs Reduce Function reduce (out_key, intermediate_value) -> list(out_value) Process intermediate key/value pairs Combines intermediate values per unique key Produce a set of merged output values(usually just one) EECS 582 – W166

7 Programming Model EECS 582 – W167 [input (key, value)] [Intermediate (key, value)] [Unique key, output value list] Map Function Shuffle (merge sort by key)Reduce function

8 Example: WordCount EECS 582 – W168 the small brown fox a fox speaks to another fox brown cow cross the road Input Shuffle Split the small brown fox a fox speaks to another fox brown cow cross the road Map Reduce Output a, 1 another 1 brown, 2 cross, 1 cow, 1 fox, 3 road, 1 small, 1 speaks, 1 the, 2 to, 1

9 More Programs based on MR EECS 582 – W169 Inverted Index Distributed Sort Distributed Grep URL Frequency MapReduce Program in Google source tree

10 System Implementation: Overview Cluster Characteristic 100s/1000s of 2-CPU x86 machines, 2-4 GB of memory Limited bisection bandwidth Storage is on local IDE disks Infrastructure GFS: distributed file system manages data (SOSP'03) Job scheduling system: jobs made up of tasks, scheduler assigns tasks to machines (Borg?) EECS 582 – W1610

11 GFS Scheduler submit allocate Control Flow and data flow EECS 582 – W1611 User Program Worker Master Worker fork assign map assign reduce read remote read, sort Output File 0 Output File 1 write Split 0 Split 1 Split 2 Input Data MapReduce Architecture local write Notify location of local write

12 Execution EECS 582 – W1612

13 Parallel Execution EECS 582 – W1613

14 Coordinate EECS 582 – W1614 Master data structures Task status: (idle, in-progress, completed) Idle tasks get scheduled as workers become available When a map task completes, it sends the master the location and sizes of its R intermediate files, one for each reducer Master incrementally pushes this info to reducers Master pings workers periodically to detect failures

15 Fault Tolerance EECS 582 – W1615 Map worker failure Completed or in-progress tasks are reset to idle Reduce worker failure Only in-progress tasks are reset to idle Why? Master failure MapReduce Task is aborted and client is notified Reset tasks are rescheduled on another machine

16 Task Granularity EECS 582 – W1616 Many more map tasks than machines Minimizes time for fault recovery Can pipeline shuffling with map Better dynamic Load balancing Execution Often use 200,000 map/5000 reduce tasks w/ 2000 machines

17 Backup Tasks EECS 582 – W1617 Slow worker delay completion time Processor’s cache being disabled Bad disk with soft errors Start back-up tasks, for those in-progress as the job nears completion First task to complete is considered Disk

18 Disk Locality EECS 582 – W1618 Map tasks are scheduled close to data on nodes that have input data if not, on nodes that are nearer to input data Ex. Same network switch Conserves network bandwidth Leverage the Google File System

19 Combiner Function EECS 582 – W1619 Local reducer at the map worker Can save network time by pre-aggregating at mapper Works only if reduce function is commutative and associative combine(k1, list(v1)) Map

20 WordCount: No combine EECS 582 – W1620 the small brown fox a fox speaks to another fox brown cow cross the road Input Shuffle Split the small brown fox a fox speaks to another fox brown cow cross the road Map Reduce Output a, 1 another 1 brown, 2 cross, 1 cow, 1 fox, 3 road, 1 small, 1 speaks, 1 the, 2 to, 1

21 WordCount: Combine EECS 582 – W1621 the small brown fox a fox speaks to another fox brown cow cross the road Input Shuffle Split the small brown fox a fox speaks to another fox brown cow cross the road Map Reduce Output a, 1 another 1 brown, 2 cross, 1 cow, 1 fox, 3 road, 1 small, 1 speaks, 1 the, 2 to, 1

22 Partitioning Function EECS 582 – W1622 Records with the same intermediate key end up at the same reducer Default partition function e.g., hash(key) mod R Sometimes useful to override E.g., hash(hostname(URL)) mod R ensures URLs from a host end up in the same output file

23 More Features EECS 582 – W1623 Skipping Bad Records Input and Output Types Local Execution Status Information Counters etc...

24 Performance EECS 582 – W1624 Tests run on cluster of 1800 machines: 4 GB of memory 2 GHz Intel Xeon Dual-processor Dual 160 GB IDE disks Two benchmarks: MR_GrepScan 1010 100-byte records to extract records matching a rare pattern MR_SortSort 1010 100-byte records

25 Performance – MR_Grep EECS 582 – W1625 Locality optimization helps: 1800 machines read 1 TB of data at peak of ~31GB/s Without this, rack switches would limit to 10 GB/s Startup overhead is significant for short jobs

26 Performance – MR_Sort EECS 582 – W1626 Backup tasks reduce job completion time significantly System deals well with failures

27 Conclusion EECS 582 – W1627 Inexpensive commodity machines can be the basis of a large scale reliable system MapReduce hides all the messy details of distributed computing MapReduce provides a simple parallel programming interface

28 Lessons learn EECS 582 – W1628 General Design General Abstraction -> Solve many problems -> Success Simple Interface -> Fast Adaption -> Success Distributed System design Network is a scarce resource Locality matters Pre-aggregate whenever possible Master-Worker architecture is simple yet powerful

29 Influence EECS 582 – W1629 MapReduce is one of the MOST cited system paper : 16648 as for 03/08/201616648 Together with Google File System, Bigtable, it inspires the Big Data Era What happen after MapReduce?

30 In Open source world: Hadoop EECS 582 – W1630 2005: Doug Cutting and Michael J. Cafarella developed Hadoop to support distribution for the Nutch search engine project(invert index) Now:

31 In Google EECS 582 – W1631

32 Barrier Problem with MapReduce: I/O Barrier Any MR algorithm can be simulated on BSP and vice versa EECS 582 – W1632 Map Reduce GFSGFS GFSGFS

33 Post MapReduce System EECS 582 – W1633 Hadoop MapReduce Giraph... MapReduce MillWheel DataFlow Model(Apache Beam) TensorFlow Pregel Google Open Source MapReduce DAG Computing Graph Computing Stream Processing Machine Learning General Model Tez Spark FlumeJava MLib GraphX Spark Streaming Spark Net Storm

34 Questions? EECS 582 – W1634

35 References MapReduce Architecture: http://cecs.wright.edu/~tkprasad/courses/cs707/L06MapReduce.ppt/ http://cecs.wright.edu/~tkprasad/courses/cs707/L06MapReduce.ppt/ MapReduce Presentation: http://research.google.com/archive/mapreduce-osdi04-slides/http://research.google.com/archive/mapreduce-osdi04-slides/ MapReduce Presentation: http://web.eecs.umich.edu/~mozafari/fall2015/eecs584/presentations/lecture15-a.pdf/ http://web.eecs.umich.edu/~mozafari/fall2015/eecs584/presentations/lecture15-a.pdf/ Operating system support for warehouse-scale computing: https://www.cl.cam.ac.uk/~ms705/pub/thesis-submitted.pdf/ https://www.cl.cam.ac.uk/~ms705/pub/thesis-submitted.pdf/ Apache Ecosystem Pic: http://blog.agroknow.com/?cat=1http://blog.agroknow.com/?cat=1 MapReduce: http://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce- osdi04.pdf/ http://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce- osdi04.pdf/ FlumeJava: http://pages.cs.wisc.edu/~akella/CS838/F12/838- CloudPapers/FlumeJava.pdf/http://pages.cs.wisc.edu/~akella/CS838/F12/838- CloudPapers/FlumeJava.pdf/ MillWheel: http://www.vldb.org/pvldb/vol6/p1033-akidau.pdf/http://www.vldb.org/pvldb/vol6/p1033-akidau.pdf/ Pregel: http://web.stanford.edu/class/cs347/reading/pregel.pdf/http://web.stanford.edu/class/cs347/reading/pregel.pdf/ EECS 582 – W1635

36 References Giraph: http://giraph.apache.org/http://giraph.apache.org/ Spark: http://spark.apache.org/http://spark.apache.org/ Tez: https://tez.apache.org/https://tez.apache.org/ DataFlow: http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf/http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf/ Tensorflow: https://www.tensorflow.org/https://www.tensorflow.org/ Apache Beam: http://incubator.apache.org/projects/beam.htmlhttp://incubator.apache.org/projects/beam.html SparkNet: https://github.com/amplab/SparkNethttps://github.com/amplab/SparkNet Caffe on Spark: http://yahoohadoop.tumblr.com/post/129872361846/large-scale- distributed-deep-learning-on-hadoophttp://yahoohadoop.tumblr.com/post/129872361846/large-scale- distributed-deep-learning-on-hadoop EECS 582 – W1636


Download ppt "MapReduce: Simplified Data Processing on Large Cluster Authors: Jeffrey Dean and Sanjay Ghemawat Presented by: Yang Liu, University of Michigan EECS 582."

Similar presentations


Ads by Google