Presentation is loading. Please wait.

Presentation is loading. Please wait.

MapReduce. Google and MapReduce Google searches billions of web pages very, very quickly How? It uses a technique called “MapReduce” to distribute the.

Similar presentations


Presentation on theme: "MapReduce. Google and MapReduce Google searches billions of web pages very, very quickly How? It uses a technique called “MapReduce” to distribute the."— Presentation transcript:

1 MapReduce

2 Google and MapReduce Google searches billions of web pages very, very quickly How? It uses a technique called “MapReduce” to distribute the work across a large number of computers, then combine the results This has made MapReduce a very popular approach Hadoop is an open source implementation of MapReduce Unless you work for Google, you will probably use Hadoop 2

3 How it works List(a, b, c, …).map(x => f(x)) gives List(f(a), f(b), f(c),…) List(a, b, c, …).reduce((x, y) => x y) gives a b c … where is some binary operator 3

4 Another view http://www.cnblogs.com/sharpxiajun/p/3151395.html (in Japanese) 4

5 ForkJoin How does ForkJoin differ from MapReduce? Answers from stackoverflow: ForkJoin recursively partitions a task into several subtasks, on a single machine. Takes advantage of multiple cores MapReduce only does one big split, with no communication between the parts until the reduce step. Massively scalable. Java fork/join starts quickly and scales well for small inputs (<5MB), but it cannot process larger inputs due to the size restrictions of shared-memory, single node architectures. MapReduce takes tens of seconds to start up, but scales well for much larger inputs (>100MB) on a compute cluster. 5

6 6 The End


Download ppt "MapReduce. Google and MapReduce Google searches billions of web pages very, very quickly How? It uses a technique called “MapReduce” to distribute the."

Similar presentations


Ads by Google