Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 2 – MapReduce: Theory and Implementation CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of.

Similar presentations


Presentation on theme: "Lecture 2 – MapReduce: Theory and Implementation CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of."— Presentation transcript:

1 Lecture 2 – MapReduce: Theory and Implementation CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License.

2 Last Class How do I process lots of data?  Distribute the work Can I distribute the work?  Maybe… if it’s not dependent on other tasks  Example: Fibonnaci.

3 Last Class What problems can occur?  Large tasks  Unpredictable bugs  Machine failure How do solve / avoid these?  Break up into small chunks?  Restart tasks?  Use known working solutions

4 MapReduce Concept from functional programming Implemented by Google Applied to large number of problems

5 Functional Programming Review Java: int fooA(String[] list) { return bar1(list) + bar2(list); } int fooB(String[] list) { return bar2(list) + bar1(list); } Do they give the same result?

6 Functional Programming Review Functional Programming: fun fooA(l: int list) = bar1(l) + bar2(l) fun fooB(l: int list) = bar2(l) + bar1(l) Do they give the same result?

7 Functional Programming Review Operations do not modify data structures: They always create new ones Original data still exists in unmodified form

8 Functional Updates Do Not Modify Structures fun foo(x, lst) = let lst' = reverse lst in reverse ( x :: lst' ) foo: a’ -> a’ list -> a’ list The foo() function above reverses a list, adds a new element to the front, and returns all of that, reversed, which appends an item. But it never modifies lst!

9 Functions Can Be Used As Arguments fun DoDouble(f, x) = f (f x) It does not matter what f does to its argument; DoDouble() will do it twice. What is the type of this function? x: a’ f: a’ -> a’ DoDouble: (a’ -> a’) -> a’ -> a’

10 map (Functional Programming) Creates a new list by applying f to each element of the input list; returns output in order. map f lst: (’a->’b) -> (’a list) -> (’b list)

11 map Implementation This implementation moves left-to-right across the list, mapping elements one at a time … But does it need to? fun map f [] = [] | map f (x::xs) = (f x) :: (map f xs)

12 Implicit Parallelism In map In a purely functional setting, elements of a list being computed by map cannot see the effects of the computations on other elements If order of application of f to elements in list is commutative, we can reorder or parallelize execution This is the “secret” that MapReduce exploits

13 Fold Moves across a list, applying f to each element plus an accumulator. f returns the next accumulator value, which is combined with the next element of the list fold f x0 lst: ('a*'b->'b)->'b->('a list)->'b

14 fold left vs. fold right Order of list elements can be significant Fold left moves left-to-right across the list Fold right moves from right-to-left SML Implementation: fun foldl f a [] = a | foldl f a (x::xs) = foldl f (f(x, a)) xs fun foldr f a [] = a | foldr f a (x::xs) = f(x, (foldr f a xs))

15 Example fun foo(l: int list) = sum(l) + mul(l) + length(l) How can we implement this?

16 Example (Solved) fun foo(l: int list) = sum(l) + mul(l) + length(l) fun sum(lst) = foldl (fn (x,a)=>x+a) 0 lst fun mul(lst) = foldl (fn (x,a)=>x*a) 1 lst fun length(lst) = foldl (fn (x,a)=>1+a) 0 lst

17 Google MapReduce Input Handling Map function Partition Function Compare Function Reduce Function Output Writer

18 Input Handling Divides up data into bite-size chunks Starts up tasks Assigns tasks to idle workers

19 Map Input: Key, Value pair Output: Key, Value pairs Example: Annual Rainfall Per City

20 Map (Example) Example: Annual Rainfall Per City map(String key, String value): // key: date // value: weather info foreach (City c in value) EmitIntermediate(c, c.temperature)

21 Partition Function Allocates map output to particular reduces Input: key, number of reduces Output: Index of desired reduce Typical: hash(key) % numberOfReduces

22 Comparison Sorts input for each reduce Example: Annual rainfall per city  Sorts rainfall data for each city  Seattle: {0, 0, 0, 1, 4, 7, 10, …}

23 Reduce Input: Key, Sorted list of values Output: Single value Example: Annual rainfall per city

24 Reduce Input: Key, Sorted list of values Output: Single value Example: Annual rainfall per city

25 Reduce (Example) Example: Annual rainfall per city  reduce(String key, Iterator values): // key: city // values: temperature sum = 0, count = 0 for each (v in values) sum += v count = count + 1 Emit(sum / count)

26 Output Writes the output to storage (GFS, etc)

27

28 MapReduce for Google Local Intersections Rendering Tiles Finding nearest gas stations


Download ppt "Lecture 2 – MapReduce: Theory and Implementation CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of."

Similar presentations


Ads by Google