MapReduce Theory and Practice 彭波北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and.

MapReduce Theory and Practice http://net.pku.edu.cn/~course/cs402/2010/ 彭波 pb@net.pku.edu.cn 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and Aaron KimballJimmy Lin Aaron Kimball

2 大纲 Functional Language and MapReduce MapReduce Basic MapReduce Algorithm Design Hadoop and Java Practice

Functional Language and MapReduce

4 What is Functional Programming? In computer science, functional programming is a programming paradigm that treats computation as the evaluation of mathematical functions and avoids state and mutable data. It emphasizes the application of functions, in contrast with the imperative programming style that emphasizes changes in state.[1]computer scienceprogramming paradigm computationmathematical functionsstatemutableimperative programming[1]

5 Example Summing the integers 1 to 10 in Java: total = 0; for (i = 1; i  10; ++i) total = total+i; The computation method is variable assignment. 5

6 Example Summing the integers 1 to 10 in Haskell: sum [1..10] The computation method is function application. 6

7 Why is it Useful? The abstract nature of functional programming leads to considerably simpler programs; It also supports a number of powerful new ways to structure and reason about programs.

8 Functional Programming Review Functional operations do not modify data structures: they always create new ones Original data still exists in unmodified form Data flows are implicit in program design Order of operations does not matter

9 Functional Programming Review fun foo(l: int list) = sum(l) + mul(l) + length(l) Order of sum() and mul(), etc does not matter They do not modify l

10 Functional Updates Do Not Modify Structures fun append(x, lst) = let lst' = reverse lst in reverse ( x :: lst' ) The append() function above reverses a list, adds a new element to the front, and returns all of that, reversed, which appends an item. But it never modifies lst!

11 Functions Can Be Used As Arguments fun DoDouble(f, x) = f (f x) It does not matter what f does to its argument; DoDouble() will do it twice. A function is called higher-order if it takes a function as an argument or returns a function as a result

12 Map map f lst: ( ’ a-> ’ b) -> ( ’ a list) -> ( ’ b list) Creates a new list by applying f to each element of the input list; returns output in order.

13 Fold fold f x 0 lst: ('a*'b->'b)->'b->('a list)->'b Moves across a list, applying f to each element plus an accumulator. f returns the next accumulator value, which is combined with the next element of the list

14 fold left vs. fold right Order of list elements can be significant Fold left moves left-to-right across the list Fold right moves from right-to-left SML Implementation: fun foldl f a [] = a | foldl f a (x::xs) = foldl f (f(x, a)) xs fun foldr f a [] = a | foldr f a (x::xs) = f(x, (foldr f a xs))

15 Example fun foo(l: int list) = sum(l) + mul(l) + length(l) How can we implement this by map and foldl?

16 Example (Solved) fun foo(l: int list) = sum(l) + mul(l) + length(l) fun sum(lst) = foldl (fn (a,x)=>a+x) 0 lst fun mul(lst) = foldl (fn (a,x)=>a*x) 1 lst fun length(lst) = foldl (fn (a,x)=>a+1) 0 lst

17 map Implementation This implementation moves left-to-right across the list, mapping elements one at a time … But does it need to? fun map f [] = [] | map f (x::xs) = (f x) :: (map f xs)

18 Implicit Parallelism In map In a purely functional setting, elements of a list being computed by map cannot see the effects of the computations on other elements If order of application of f to elements in list is commutative, we can reorder or parallelize execution This is the “ secret ” that MapReduce exploits

19 References http://net.pku.edu.cn/~course/cs501/2008/resou rce/haskell/functional.ppt http://net.pku.edu.cn/~course/cs501/2008/resou rce/haskell/functional.ppt http://net.pku.edu.cn/~course/cs501/2008/resou rce/haskell/ http://net.pku.edu.cn/~course/cs501/2008/resou rce/haskell/

MapReduce Basic

21 Typical Large-Data Problem Iterate over a large number of records Extract something of interest from each Shuffle and sort intermediate results Aggregate intermediate results Generate final output Key idea: provide a functional abstraction for these two operations Map Reduce (Dean and Ghemawat, OSDI 2004)

22 Roots in Functional Programming ggggg fffff Map Fold

23 MapReduce Programmers specify two functions: map (k, v) → * reduce (k’, v’) → * All values with the same key are sent to the same reducer The execution framework handles everything else…

map Shuffle and Sort: aggregate values by keys reduce k1k1 k2k2 k3k3 k4k4 k5k5 k6k6 v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 ba12cc36ac52bc78 a15b27c2368 r1r1 s1s1 r2r2 s2s2 r3r3 s3s3

25 MapReduce Programmers specify two functions: map (k, v) → * reduce (k’, v’) → * All values with the same key are sent to the same reducer The execution framework handles everything else… What’s “everything else”?

26 MapReduce “Runtime” Handles scheduling Assigns workers to map and reduce tasks Handles “data distribution” Moves processes to data Handles synchronization Gathers, sorts, and shuffles intermediate data Handles errors and faults Detects worker failures and restarts Everything happens on top of a distributed FS (later)

27 MapReduce Programmers specify two functions: map (k, v) → * reduce (k’, v’) → * All values with the same key are reduced together The execution framework handles everything else… Not quite…usually, programmers also specify: partition (k’, number of partitions) → partition for k’ Often a simple hash of the key, e.g., hash(k’) mod n Divides up key space for parallel reduce operations combine (k’, v’) → * Mini-reducers that run in memory after the map phase Used as an optimization to reduce network traffic

combine ba12c9ac52bc78 partition map k1k1 k2k2 k3k3 k4k4 k5k5 k6k6 v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 ba12cc36ac52bc78 Shuffle and Sort: aggregate values by keys reduce a15b27c298 r1r1 s1s1 r2r2 s2s2 r3r3 s3s3 c2368

29 Two more details… Barrier between map and reduce phases But we can begin copying intermediate data earlier Keys arrive at each reducer in sorted order No enforced ordering across reducers

30 “Hello World”: Word Count Map(String docid, String text): for each word w in text: Emit(w, 1); Reduce(String term, Iterable values): int sum = 0; for each v in values: sum += v; Emit(term, value);

31 MapReduce can refer to… The programming model The execution framework (aka “runtime”) The specific implementation Usage is usually clear from context!

32 MapReduce Implementations Google has a proprietary implementation in C++ Bindings in Java, Python Hadoop is an open-source implementation in Java Development led by Yahoo, used in production Now an Apache project Rapidly expanding software ecosystem Lots of custom research implementations For GPUs, cell processors, etc.

split 0 split 1 split 2 split 3 split 4 worker Master User Program output file 0 output file 1 (1) submit (2) schedule map(2) schedule reduce (3) read (4) local write (5) remote read (6) write Input files Map phase Intermediate files (on local disk) Reduce phase Output files Adapted from (Dean and Ghemawat, OSDI 2004)

MapReduce Algorithm Design

35 “Everything Else” The execution framework handles everything else… Scheduling: assigns workers to map and reduce tasks “Data distribution”: moves processes to data Synchronization: gathers, sorts, and shuffles intermediate data Errors and faults: detects worker failures and restarts Limited control over data and execution flow All algorithms must expressed in m, r, c, p You don’t know: Where mappers and reducers run When a mapper or reducer begins or finishes Which input a particular mapper is processing Which intermediate key a particular reducer is processing

36 Tools for Programmer Cleverly-constructed data structures Bring partial results together Sort order of intermediate keys Control order in which reducers process keys Partitioner Control which reducer processes which keys Preserving state in mappers and reducers Capture dependencies across multiple keys and values

37 Preserving State Mapper object configure map close state one object per task Reducer object configure reduce close state one call per input key-value pair one call per intermediate key API initialization hook API cleanup hook

38 Scalable Hadoop Algorithms: Themes Avoid object creation Inherently costly operation Garbage collection Avoid buffering Limited heap size Works for small datasets, but won’t scale!

39 Importance of Local Aggregation Ideal scaling characteristics: Twice the data, twice the running time Twice the resources, half the running time Why can’t we achieve this? Synchronization requires communication Communication kills performance Thus… avoid communication! Reduce intermediate data via local aggregation Combiners can help

40 Shuffle and Sort Mapper Reducer other mappers other reducers circular buffer (in memory) spills (on disk) merged spills (on disk) intermediate files (on disk) Combiner

41 Word Count: Baseline What’s the impact of combiners?

42 Word Count: Version 1 Are combiners still needed?

43 Word Count: Version 2 Are combiners still needed? Key: preserve state across input key-value pairs!

44 Design Pattern for Local Aggregation “In-mapper combining” Fold the functionality of the combiner into the mapper by preserving state across multiple map calls Advantages Speed Why is this faster than actual combiners? Disadvantages Explicit memory management required Potential for order-dependent bugs

45 Combiner Design Combiners and reducers share same method signature Sometimes, reducers can serve as combiners Often, not… Remember: combiner are optional optimizations Should not affect algorithm correctness May be run 0, 1, or multiple times Example: find average of all integers associated with the same key

46 Computing the Mean: Version 1 Why can’t we use reducer as combiner?

47 Computing the Mean: Version 2 Why doesn’t this work?

48 Computing the Mean: Version 3 Fixed?

49 Computing the Mean: Version 4 Are combiners still needed?

50 Algorithm Design: Running Example Term co-occurrence matrix for a text collection M = N x N matrix (N = vocabulary size) M ij : number of times i and j co-occur in some context (for concreteness, let’s say context = sentence) Why? Distributional profiles as a way of measuring semantic distance Semantic distance useful for many language processing tasks

51 MapReduce: Large Counting Problems Term co-occurrence matrix for a text collection = specific instance of a large counting problem A large event space (number of terms) A large number of observations (the collection itself) Goal: keep track of interesting statistics about the events Basic approach Mappers generate partial counts Reducers aggregate partial counts How do we aggregate partial counts efficiently?

52 First Try: “Pairs” Each mapper takes a sentence: Generate all co-occurring term pairs For all pairs, emit (a, b) → count Reducers sum up counts associated with these pairs Use combiners!

53 Pairs: Pseudo-Code

54 “Pairs” Analysis Advantages Easy to implement, easy to understand Disadvantages Lots of pairs to sort and shuffle around (upper bound?) Not many opportunities for combiners to work

55 Idea: group together pairs into an associative array Each mapper takes a sentence: Generate all co-occurring term pairs For each term, emit a → { b: count b, c: count c, d: count d … } Reducers perform element-wise sum of associative arrays Another Try: “Stripes” (a, b) → 1 (a, c) → 2 (a, d) → 5 (a, e) → 3 (a, f) → 2 a → { b: 1, c: 2, d: 5, e: 3, f: 2 } a → { b: 1, d: 5, e: 3 } a → { b: 1, c: 2, d: 2, f: 2 } a → { b: 2, c: 2, d: 7, e: 3, f: 2 } + Key: cleverly-constructed data structure brings together partial results

56 Stripes: Pseudo-Code

57 “Stripes” Analysis Advantages Far less sorting and shuffling of key-value pairs Can make better use of combiners Disadvantages More difficult to implement Underlying object more heavyweight Fundamental limitation in terms of size of event space

58 Cluster size: 38 cores Data Source: Associated Press Worldstream (APW) of the English Gigaword Corpus (v3), which contains 2.27 million documents (1.8 GB compressed, 5.7 GB uncompressed)

60 Relative Frequencies How do we estimate relative frequencies from counts? Why do we want to do this? How do we do this with MapReduce? Joint Event Marginal

61 f(B|A): “Stripes” Easy! One pass to compute (a, *) Another pass to directly compute f(B|A) a → {b 1 :3, b 2 :12, b 3 :7, b 4 :1, … }

62 f(B|A): “Pairs” For this to work: Must emit extra (a, *) for every b n in mapper Must make sure all a’s get sent to same reducer (use partitioner) Must make sure (a, *) comes first (define sort order) Must hold state in reducer across different key-value pairs (a, b 1 ) → 3 (a, b 2 ) → 12 (a, b 3 ) → 7 (a, b 4 ) → 1 … (a, *) → 32 (a, b 1 ) → 3 / 32 (a, b 2 ) → 12 / 32 (a, b 3 ) → 7 / 32 (a, b 4 ) → 1 / 32 … Reducer holds this value in memory

63 “Order Inversion” Common design pattern Computing relative frequencies requires marginal counts But marginal cannot be computed until you see all counts Buffering is a bad idea! Trick: getting the marginal counts to arrive at the reducer before the joint counts Optimizations Apply in-memory combining pattern to accumulate marginal counts Should we apply combiners?

64 Synchronization: Pairs vs. Stripes Approach 1: turn synchronization into an ordering problem Sort keys into correct order of computation Partition key space so that each reducer gets the appropriate set of partial results Hold state in reducer across multiple key-value pairs to perform computation Illustrated by the “pairs” approach Approach 2: construct data structures that bring partial results together Each reducer receives all the data it needs to complete the computation Illustrated by the “stripes” approach

65 Secondary Sorting MapReduce sorts input to reducers by key Values may be arbitrarily ordered What if want to sort value also? E.g., k → (v 1, r), (v 3, r), (v 4, r), (v 8, r)…

66 Secondary Sorting: Solutions Solution 1: Buffer values in memory, then sort Why is this a bad idea? Solution 2: “Value-to-key conversion” design pattern: form composite intermediate key, (k, v 1 ) Let execution framework do the sorting Preserve state across multiple key-value pairs to handle processing Anything else we need to do?

67 Recap: Tools for Synchronization Cleverly-constructed data structures Bring data together Sort order of intermediate keys Control order in which reducers process keys Partitioner Control which reducer processes which keys Preserving state in mappers and reducers Capture dependencies across multiple keys and values

68 Issues and Tradeoffs Number of key-value pairs Object creation overhead Time for sorting and shuffling pairs across the network Size of each key-value pair De/serialization overhead Local aggregation Opportunities to perform local aggregation varies Combiners make a big difference Combiners vs. in-mapper combining RAM vs. disk vs. network

69 Debugging at Scale Works on small datasets, won’t scale… why? Memory management issues (buffering and object creation) Too much intermediate data Mangled input records Real-world data is messy! Word count: how many unique words in Wikipedia? There’s no such thing as “consistent data” Watch out for corner cases Isolate unexpected behavior, bring local

Hadoop and Java Practice

71 Basic Hadoop API* (0.20.0) Mapper map(KEYIN key, VALUEIN value, Mapper.Context context) mapKEYINVALUEINMapper.Context setup(Mapper.Context context) setupMapper.Context cleanup(Mapper.Context context) cleanupMapper.Context Reducer/Combiner reduce(KEYIN key, Iterable values, Reducer.Context context) reduceKEYINIterableVALUEIN Reducer.Context Setup/cleanup Partitioner getPartition(KEY key, VALUE value, int numPartitions) getPartitionKEYVALUE *Note: forthcoming API changes…

72 Data Types in Hadoop WritableDefines a de/serialization protocol. Every data type in Hadoop is a Writable. WritableComprableDefines a sort order. All keys must be of this type (but not values). IntWritable LongWritable Text … Concrete classes for different data types. SequenceFiles Binary encoded of a sequence of key/value pairs

73 Complex Data Types in Hadoop How do you implement complex data types? The easiest way: Encoded it as Text, e.g., (a, b) = “a:b” Use regular expressions to parse and extract data Works, but pretty hack-ish The hard way: Define a custom implementation of WritableComprable Must implement: readFields, write, compareTo Computationally efficient, but slow for rapid prototyping

74 Basic Cluster Components One of each: Namenode (NN) Jobtracker (JT) Set of each per slave machine: Tasktracker (TT) Datanode (DN)

75 Putting everything together… datanode daemon Linux file system … tasktracker slave node datanode daemon Linux file system … tasktracker slave node datanode daemon Linux file system … tasktracker slave node namenode namenode daemon job submission node jobtracker

76 Anatomy of a Job MapReduce program in Hadoop = Hadoop job Jobs are divided into map and reduce tasks An instance of running a task is called a task attempt Multiple jobs can be composed into a workflow Job submission process Client (i.e., driver program) creates a job, configures it, and submits it to job tracker JobClient computes input splits (on client end) Job data (jar, configuration XML) are sent to JobTracker JobTracker puts job data in shared location, enqueues tasks TaskTrackers poll for tasks Off to the races…

InputSplit Source: redrawn from a slide by Cloduera, cc-licensed InputSplit Input File InputSplit RecordReader Mapper Intermediates Mapper Intermediates Mapper Intermediates Mapper Intermediates Mapper Intermediates InputFormat

Source: redrawn from a slide by Cloduera, cc-licensed Mapper Partitioner Intermediates Reducer Reduce Intermediates (combiners omitted here)

Source: redrawn from a slide by Cloduera, cc-licensed Reducer Reduce Output File RecordWriter OutputFormat Output File RecordWriter Output File RecordWriter

80 Input and Output InputFormat: TextInputFormat KeyValueTextInputFormat SequenceFileInputFormat … OutputFormat: TextOutputFormat SequenceFileOutputFormat …

81 Shuffle and Sort in Hadoop Probably the most complex aspect of MapReduce! Map side Map outputs are buffered in memory in a circular buffer When buffer reaches threshold, contents are “spilled” to disk Spills merged in a single, partitioned file (sorted within each partition): combiner runs here Reduce side First, map outputs are copied over to reducer machine “Sort” is a multi-pass merge of map outputs (happens in memory and on disk): combiner runs here Final merge pass goes directly into reducer

83 What is Hugs? An interpreter for Haskell, and the most widely used implementation of the language; An interactive system, which is well-suited for teaching and prototyping purposes; Hugs is freely available from: www.haskell.org/hugs

84 The Standard Prelude When Hugs is started it first loads the library file Prelude.hs, and then repeatedly prompts the user for an expression to be evaluated. For example: > 2+3*4 14 > (2+3)*4 20

85 > length [1,2,3,4] 4 > product [1,2,3,4] 24 > take 3 [1,2,3,4,5] [1,2,3] The standard prelude also provides many useful functions that operate on lists. For example:

86 Function Application In mathematics, function application is denoted using parentheses, and multiplication is often denoted using juxtaposition or space. f(a,b) + c d Apply the function f to a and b, and add the result to the product of c and d.

87 In Haskell, function application is denoted using space, and multiplication is denoted using *. f a b + c*d As previously, but in Haskell syntax.

MapReduce Theory and Practice 彭波北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and.

Similar presentations

Presentation on theme: "MapReduce Theory and Practice 彭波北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

MapReduce Theory and Practice 彭波 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and.

Similar presentations

Presentation on theme: "MapReduce Theory and Practice 彭波 北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and."— Presentation transcript:

Similar presentations

About project

Feedback

MapReduce Theory and Practice 彭波北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and.

Presentation on theme: "MapReduce Theory and Practice 彭波北京大学信息科学技术学院 7/15/2010 Some Slides borrow from Jimmy Lin and."— Presentation transcript: