Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion.

Similar presentations


Presentation on theme: "1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion."— Presentation transcript:

1 1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion

2 2 Distributed File Systems Chunking Replication Distribution on Racks

3 3 Distributed File Systems uFiles are very large, read/append. uThey are divided into chunks. wTypically 64MB to a chunk. uChunks are replicated at several compute-nodes. uA master (possibly replicated) keeps track of all locations of all chunks.

4 4 Compute Nodes uOrganized into racks. uIntra-rack connection typically gigabit speed. uInter-rack connection faster by a small factor.

5 5 Racks of Compute Nodes File Chunks

6 6 3-way replication of files, with copies on different racks.

7 7 Implementations uGFS (Google File System – proprietary). uHDFS (Hadoop Distributed File System – open source). uCloudStore (Kosmix File System, open source).

8 8 Above the DFS Map-Reduce Key-Value Stores SQL Implementations

9 9 The New Stack Distributed File System Map-Reduce, e.g. Hadoop Object Store (key-value store), e.g., BigTable, Hbase, Cassandra SQL Implementations, e.g., PIG (relational algebra), HIVE

10 10 Map-Reduce Systems uMap-reduce (Google) and open-source (Apache) equivalent Hadoop. uImportant specialized parallel computing tool. uCope with compute-node failures. wAvoid restart of the entire job.

11 11 Key-Value Stores uBigTable (Google), Hbase, Cassandra (Apache), Dynamo (Amazon). wEach row is a key plus values over a flexible set of columns. wEach column component can be a set of values.

12 12 SQL-Like Systems uPIG – Yahoo! implementation of relational algebra. wTranslates to a sequence of map-reduce operations, using Hadoop. uHive – open-source (Apache) implementation of a restricted SQL, called QL, over Hadoop.

13 13 SQL-Like Systems – (2) uSawzall – Google implementation of parallel select + aggregation. uScope – Microsoft implementation of restricted SQL.

14 14 Map-Reduce Formal Definition Fault-Tolerance Example: Join

15 15 Map-Reduce uYou write two functions, Map and Reduce. wThey each have a special form to be explained. uSystem (e.g., Hadoop) creates a large number of tasks for each function. wWork is divided among tasks in a precise way.

16 16 Map-Reduce Pattern Map tasks Reduce tasks Input from DFS Output to DFS “key”-value pairs

17 17 Map-Reduce Algorithms uMap tasks convert inputs to key-value pairs. w“keys” are not necessarily unique. uOutputs of Map tasks are sorted by key, and each key is assigned to one Reduce task. uReduce tasks combine values associated with a key.

18 18 Coping With Failures uMap-reduce is designed to deal with compute-nodes failing to execute a task. uRe-executes failed tasks, not whole jobs. uFailure modes: 1.Compute node failure (e.g., disk crash). 2.Rack communication failure. 3.Software failures, e.g., a task requires Java n; node has Java n-1.

19 19 Things Map-Reduce is Good At 1.Matrix-Matrix and Matrix-vector multiplication. wOne step of the PageRank iteration was the original application. 2.Relational algebra operations. wWe’ll do an example of the join. 3.Many other “embarrassingly parallel” operations.

20 20 Joining by Map-Reduce uSuppose we want to compute R(A,B) JOIN S(B,C), using k Reduce tasks. wI.e., find tuples with matching B-values. uR and S are each stored in a chunked file.

21 21 Joining by Map-Reduce – (2) uUse a hash function h from B-values to k buckets. wBucket = Reduce task. uThe Map tasks take chunks from R and S, and send: wTuple R(a,b) to Reduce task h(b). Key = b value = R(a,b). wTuple S(b,c) to Reduce task h(b). Key = b; value = S(b,c).

22 22 Joining by Map-Reduce – (3) Reduce task i Map tasks send R(a,b) if h(b) = i Map tasks send S(b,c) if h(b) = i All (a,b,c) such that h(b) = i, and (a,b) is in R, and (b,c) is in S.

23 23 Joining by Map-Reduce – (4) uKey point: If R(a,b) joins with S(b,c), then both tuples are sent to Reduce task h(b). uThus, their join (a,b,c) will be produced there and shipped to the output file.

24 24 Dataflow Systems Arbitrary Acyclic Flow Among Tasks Preserving Fault Tolerance The Blocking Property

25 25 Generalization of Map-Reduce uMap-reduce uses only two functions (Map and Reduce). wEach is implemented by a rank of tasks. wData flows from Map tasks to Reduce tasks only.

26 26 Generalization – (2) uNatural generalization is to allow any number of functions, connected in an acyclic network. uEach function implemented by tasks that feed tasks of successor function(s). uKey fault-tolerance (blocking ) property: tasks produce all their output at the end.

27 27 Many Implementations 1.Clustera – University of Wisconsin. 2.Hyracks – Univ. of California/Irvine. 3.Dryad/DryadLINQ – Microsoft. 4.Nephele/PACT – T. U. Berlin. 5.BOOM – Berkeley. 6.epiC – N. U. Singapore.

28 28 Example: Join + Aggregation uRelations D(emp, dept) and S(emp,sal). uCompute the sum of the salaries for each department. uD JOIN S computed by map-reduce. wBut each Reduce task can also group its emp-dept-sal tuples by dept and sum the salaries.

29 29 Example: Continued uA Third function is needed to take the dept-SUM(sal) pairs from each Reduce task, organize them by dept, and compute the final sum for each department.

30 30 3-Layer Dataflow Map Tasks D S Join + Group Tasks Hash by emp Final Group + Aggreg- ate Hash by dept

31 31 Recursion Transitive-Closure Example Fault-Tolerance Problem Endgame Problem Some Systems and Approaches Recent research ideas contributed by F. Afrati, V. Borkar, M. Carey, N. Polyzotis

32 32 Applications Requiring Recursion 1.PageRank, the original map-reduce application is really a recursion implemented by many rounds of map- reduce. 2.Analysis of Web structure. 3.Analysis of social networks. 4.PDE’s.

33 33 Transitive Closure uMany recursive applications involving large data are similar to transitive closure : Path(X,Y) :- Arc(X,Y) Path(X,Y) :- Path(X,Z) & Path(Z,Y) Path(X,Y) :- Arc(X,Y) Path(X,Y) :- Arc(X,Z) & Path(Z,Y) Nonlinear(Right) Linear

34 34 Implementing TC on a Cluster uUse k tasks. uHash function h sends each node of the graph to one of the k tasks. uTask i receives and stores Path(a,b) if either h(a) = i or h(b) = i, or both. uTask i must join Path(a,c) with Path(c,b) if h(c) = i.

35 35 TC on a Cluster – Basis uData is stored as relation Arc(a,b). uMap tasks read chunks of the Arc relation and send each tuple Arc(a,b) to recursive tasks h(a) and h(b). wTreated as if it were tuple Path(a,b). wIf h(a) = h(b), only one task receives.

36 36 TC on a Cluster – Recursive Tasks Task i Path(a,b) received Store Path(a,b) if new. Otherwise, ignore. Look up Path(b,c) and/or Path(d,a) for any c and d Send Path(a,c) to tasks h(a) and h(c); send Path(d,b) to tasks h(d) and h(b)

37 37 Big Problem: Managing Failure uMap-reduce depends on the blocking property. uOnly then can you restart a failed task without restarting the whole job. uBut any recursive task has to deliver some output and later get more input.

38 38 HaLoop (U. Washington) uIterates Hadoop, once for each round of the recursion. wLike iterative PageRank. uSimilar idea: Twister (U. Indiana). uClever piece is the way HaLoop tries to run each task in round i at a compute node where it can find its needed output from round i – 1.

39 39 Pregel (Google) uViews all computation as a recursion on some graph. uNodes send messages to one another. wMessages bunched into supersteps. uCheckpoint all compute nodes after some fixed number of supersteps. uOn failure, rolls all tasks back to previous checkpoint.

40 40 Example: Shortest Paths Via Pregel Node N I found a path from node M to you of length L 5 3 6 I found a path from node M to you of length L+3 I found a path from node M to you of length L+5 I found a path from node M to you of length L+6 Is this the shortest path from M I know about? table of shortest paths to N

41 41 Using Idempotence uSome recursive applications allow restart of tasks even if they have produced some output. uExample: TC is idempotent; you can send a task a duplicate Path fact without altering the result. wBut if you were counting paths, the answer would be wrong.

42 42 Big Problem: The Endgame uSome recursions, like TC, take a large number of rounds, but the number of new discoveries in later rounds drops. wT. Vassilakis (Google): searches forward on the Web graph can take hundreds of rounds. uProblem: in a cluster, transmitting small files carries much overhead.

43 43 Approach: Merge Tasks uDecide when to migrate tasks to fewer compute nodes. uData for several tasks at the same node are combined into a single file and distributed at the receiving end. uDownside: old tasks have a lot of state to move. wExample: “paths seen so far.”

44 44 Approach: Modify Algorithms uNonlinear recursions can terminate in many fewer steps than equivalent linear recursions. uExample: TC. wO(n) rounds on n-node graph for linear. wO(log n) rounds for nonlinear.

45 45 Advantage of Linear TC uThe data-volume cost (= sum of input sizes of all tasks) for executing linear TC is generally lower than that for nonlinear TC. uWhy? Each path is discovered only once. wNote: distinct paths between the same endpoints may each be discovered.

46 46 Example: Linear TC Arc + Path = Path

47 47 Nonlinear TC Constructs Path + Path = Path in Many Ways

48 48 Smart TC u(Valduriez-Boral, Ioannides) Construct a path from two paths: 1.The first has a length that is a power of 2. 2.The second is no longer than the first.

49 49 Example: Smart TC

50 50 Other Nonlinear TC Algorithms uYou can have the unique-decomposition property with many variants of nonlinear TC. uExample: Balance constructs paths from two equal-length paths. wFavor first path when length is odd.

51 51 Example: Balance

52 52 Incomparability of TC Algorithms uOn different graphs, any of the unique- decomposition algorithms – left-linear, right-linear, smart, balanced – could have the lowest data-volume cost. uOther unique-decomposition algorithms are possible and also could win.

53 53 Extension Beyond TC uCan you convert any linear recursion into an equivalent nonlinear recursion that requires logarithmic rounds? uAnswer: Not always, without increasing arity and data size.

54 54 Positive Points 1.(Agarwal, Jagadish, Ness) All linear Datalog recursions reduce to TC. 2.Right-linear chain-rule Datalog programs can be replaced by nonlinear recursions with the same arity, logarithmic rounds, and the unique-decomposition property.

55 55 Example: Alternating-Color Paths P(X,Y) :- Blue(X,Y) P(X,Y) :- Blue(X,Z) & Q(Z,Y) Q(X,Y) :- Red(X,Z) & P(Z,Y)

56 56 The Case of Reachability Reach(X) :- Source(X) Reach(X) :- Reach(Y) & Arc(Y,X) uTakes linear rounds as stated. uCan compute nonlinear TC to get Reach in O(log n) rounds. uBut, then you compute O(n 2 ) facts instead of O(n) facts on an n-node graph.

57 57 Reachability – (2) uTheorem: If you compute Reach using only unary recursive predicates, then it must take  (n) rounds on a graph of n nodes. wProof uses the ideas of Afrati, Cosmodakis, and Yannakakis from a generation ago.

58 58 Summary: Recursion uKey problems are “endgame” and nonblocking nature of recursive tasks. uIn some applications, endgame problem can be handled by using a nonlinear recursion that requires O(log n) rounds and has the unique-decomposition property.

59 59 Summary: Research Questions 1.How do you best support fault tolerance when tasks are nonblocking? 2.How do you manage tasks when the endgame problem cannot be avoided? 3.When can you replace linear recursion with nonlinear recursion requiring many fewer rounds and (roughly) the same data-volume cost?


Download ppt "1 Map-Reduce and Its Children Distributed File Systems Map-Reduce and Hadoop Dataflow Systems Extensions for Recursion."

Similar presentations


Ads by Google