Presentation is loading. Please wait.

Presentation is loading. Please wait.

Problem-solving on large-scale clusters: theory and applications Lecture 1: Introduction and Theoretical Background.

Similar presentations


Presentation on theme: "Problem-solving on large-scale clusters: theory and applications Lecture 1: Introduction and Theoretical Background."— Presentation transcript:

1 Problem-solving on large-scale clusters: theory and applications Lecture 1: Introduction and Theoretical Background

2 Today’s Outline Introductions Quiz Course Objective & Administrative Info fold and map : Theory

3 Introductions Name + trivia

4 Quiz Time! Not graded; helps us calibrate how difficult to make this seminar Okay (and encouraged!) to leave questions blank

5 Course Outline Introduction to parallel programming and distributed system design –successfully decompose problems into map and reduce stages –decide whether a problem can be solved with a parallel algorithm, and evaluate its strengths and weaknesses –understand the basic tradeoffs and major issues in distributed system design –know the common pitfalls of distributed system design This seminar is light on “facts” and “recipes”, heavy on “tradeoffs”

6 Course Information (1 of 2) Lecturers: –Albert J. Wong –Hannah Tang Lab consultant: –Alden King Liasons: –John Zahorjan –Christophe Bisciglia

7 Course Information (2 of 2) Textbook –None; see online course readings Webpage: http://www.cs.washington.edu/cse490h Mailing lists: –Course discussion: cse490h@...

8 Warning: Theory Ahead! Before we can talk about MapReduce, we need to talk about the concepts on which it is founded: –Programming languages: fold and map –Distributed systems: data dependancies

9 Digression: Function Objects (1 of 3) A function object is a function that can be manipulated as an object –Sometimes referred to as a “functor” In Java, this is usually implemented with a class that has an execute() (or similarly named) method

10 class ReverseAlphaOrder implements Comparable { public int Compare(Object o1, Object o2) { if(o1 instanceof String && o2 instanceof String) { return String(o1) >= String(o2); } String[] myStrings; ReverseAlphaOrder rao; Collections.sort(myStrings, rao); Digression: Function Objects (2 of 3) Example: Inheriting from the Comparable interface to use Collections.sort() The underlying idea is to pass the “greater than” operation to sort()

11 Digression: Function Objects (3 of 3) In Java, methods that take function objects are “higher-order functions” –Collections.sort() is a higher-order function Mathematically, a “higher order function” is a function which does at least one of the following: –Take one or more functions as input –Output a function Examples: –The derivative (from calculus) d/dx (x 3 + 2x) = 3x 2 + 2

12 fold - Introduction fold is a family of higher-order functions that process a data structure and return a single value –Commonly, fold takes a function f and a list l, and recursively applies f to “combine” the elements of l –The return value may be “complex”, e.g. a list Example: –fold (+) [1,2,4,8] -> ??? –fold (/) [64,8,4,2] -> ???

13 fold - Directionality Remember how we said fold was “a family of functions”? –foldr (/) [64,8,4,2] -> 64 / (8 / (4/2)) -> 16 –foldl (/) [64,8,4,2] -> ((64/8) / 4) / 2 -> 1 “fold right” –recursively applies f over the right side of the list “fold left” –recursively applies f over the left side of the list Right foldLeft fold 64 8 4 ÷ ÷ 2 ÷ 4 8 ÷ ÷ 2 ÷

14 fold - Questions Discussion questions: –What should the base case return? foldr (+) [] -> ??? foldr (/) [] -> ??? –Can a right fold be implemented as a loop (using tail recursion)? What about left fold? Enrichment questions: –What happens to a right fold when given an infinite list? What about left fold?

15 fold - Formal Definition fold takes a function and a list as its inputs – but it can also take more values. –In particular, fold maintains context / state across each invocation of f -- If the list is empty, return the initial value ‘z’ foldr f z [] = z -- If the list is not empty, calculate the result of folding the -- rest, and apply f to the first element and to that result. -- The context from previous invocations of f is implicitly -- passed to the current invocation of via foldr foldr f z (x:xs) = f x (foldr f z xs) What is the formal definition of foldl ?

16 fold – An Intuition fold “iterates” over a data structure, and maintains one unit of state –At each iteration, f is invoked with the current element and the current state –fold ’s return value is the result of f ’s final invocation

17 map - Introduction map is a higher-order function that “transforms” each element in a sequence of elements –Commonly, map takes a function f and a sequence s, and applies f to each element of s Example: –map square_root [1,4,9,16] -> ???

18 map ’s Return Value map returns a sequence –The new sequence s’ is not necessarily the same size as s –The elements of s’ do not necessarily have the same type as the elements of s

19 Recall that the sum of N vectors was equal to the sum of their components: Let components() decompose a vector into its X and Y components map ’s Return Value – Example a b a+b map components [ ] =,, ), (,,, = [ () ] ???,,,,, = [ ] ???

20 map - Questions Enrichment questions: –For what values of f and z will fold f z l = l ? How can you modify f such that fold f z l = map f l ? –Bonus question: can you implement map in terms of fold? –Visit foldl.com and foldr.com :)

21 map – Formal definition map takes a function and a data structure as its inputs -- If the list is empty, there’s nothing to do map f [] = [] -- If the list is not empty, apply f to the first element and -- add the result to the mapping of f on all other elements map f (x:xs) = f x : map f xs What is the complexity of map ? What is its runtime?

22 Exercise (1 of 2) Individually: –Determine how these operations can be solved with a fold, a map, or some combination of fold and map : Given a list of vectors, add them to determine the resultant vector. Ray tracing a single ray –Ray tracing takes a list of rays that intersect the camera, and traces their path back to their respective lightsources, even across their reflection over several surfaces Assuming you had access to a company’s monthly paystubs for all employees for an entire year, calculate how much annual income tax is owed per-person. Run-length encoding. –Run-length encoding takes a possibly-repetitive string and rewrites it as a (value, frequency) pair, eg “aaa b ccccc dd” -> “a3 b c5 d2”. Find the smallest element in an array –Come up with some challenging problems yourself!

23 Exercise (2 of 2) In small groups, compare your answers to the above, and stump your team with the problems you came up with!


Download ppt "Problem-solving on large-scale clusters: theory and applications Lecture 1: Introduction and Theoretical Background."

Similar presentations


Ads by Google