Presentation is loading. Please wait.

Presentation is loading. Please wait.

E2E Arguments & Project Suggestions (Lecture 4, cs262a)

Similar presentations


Presentation on theme: "E2E Arguments & Project Suggestions (Lecture 4, cs262a)"— Presentation transcript:

1 E2E Arguments & Project Suggestions (Lecture 4, cs262a)
Ion Stoica, UC Berkeley September 7, 2016 3-D graph Checklist What we want to enable What we have today How we’ll get there

2 Software Modularity Break system into modules:
Well-defined interfaces gives flexibility Change implementation of modules Extend functionality of system by adding new modules Interfaces hide information Allows for flexibility But can hurt performance

3 Network Modularity Like software modularity, but with a twist:
Implementation distributed across routers and hosts Must decide: How to break system into modules Where modules are implemented

4 Layering Layering is a particular form of modularization
System is broken into a vertical hierarchy of logically distinct entities (layers) Service provided by one layer is based solely on the service provided by layer below Rigid structure: easy reuse, performance suffers

5 The Problem Re-implement every application for every technology?
FTP NFS HTTP Application Coaxial cable Fiber optic Packet radio Transmission Media Re-implement every application for every technology? No! But how does the Internet architecture avoid this?

6 Solution: Intermediate Layer
Introduce an intermediate layer that provides a single abstraction for various network technologies A new app/media implemented only once Variation on “add another level of indirection” p2p HTTP Application SSH NFS Intermediate layer Transmission Media Coaxial cable Fiber optic Packet radio

7 Placing Functionality
Most influential paper about placing functionality is “End-to-End Arguments in System Design” by Saltzer, Reed, and Clark “Sacred Text” of the Internet Endless disputes about what it means Everyone cites it as supporting their position

8 Basic Observation Some applications have end-to-end performance requirements Reliability, security, etc Implementing these in the network is very hard: Every step along the way must be fail-proof Hosts: Can satisfy the requirement without the network Can’t depend on the network

9 Example: Reliable File Transfer
Host A Host B Appl. Appl. OK OS OS Solution 1: make each step reliable, and then concatenate them Solution 2: end-to-end check and retry

10 Discussion Solution 1 not complete Solution 2 is complete
What happens if any network element misbehaves? Receiver has to do the check anyway! Solution 2 is complete Full functionality can be entirely implemented at application layer with no need for reliability from lower layers Is there any need to implement reliability at lower layers?

11 Take Away Implementing this functionality in the network:
Doesn’t reduce host implementation complexity Does increase network complexity Probably imposes delay and overhead on all applications, even if they don’t need functionality However, implementing in network can enhance performance in some cases E.g., very lossy link

12 Conservative Interpretation
“Don’t implement a function at the lower levels of the system unless it can be completely implemented at this level” Unless you can relieve the burden from hosts, then don’t bother

13 Radical Interpretation
Don’t implement anything in the network that can be implemented correctly by the hosts E.g., multicast Make network layer absolutely minimal Ignore performance issues

14 Moderate Interpretation
Think twice before implementing functionality in the network If hosts can implement functionality correctly, implement it a lower layer only as a performance enhancement But do so only if it does not impose burden on applications that do not require that functionality

15 Summary Layering is a good way to organize systems (e.g., networks)
Unified Internet layer decouples apps from networks E2E argument encourages us to keep lower layers (e.g., IP) simple

16 Projects Suggestions

17 Spark, a BSP System … … tasks (processors) tasks (processors) RDD RDD
Shuffle stage (super-step) stage (super-step)

18 Barrier implicit by data dependency
Spark, a BSP System all tasks in same stage implement same operations, single-threaded, deterministic execution tasks (processors) tasks (processors) RDD RDD Shuffle Immutable dataset Barrier implicit by data dependency stage (super-step) stage (super-step)

19 Scheduling for Heterogeneous Resources
Spark: assumes tasks are single-threaded One task per slot Typically, one slot per core Challenge: a task my call a library that Is multithreaded Runs on other computation resources, GPUs Generalize Spark’s scheduling model

20 BSP Limitations BSP, great for data parallel jobs
Not best fit for more complex computations Linear algebra algorithms (multiple inner loops) Some ML algorithms

21 Example: Recurrent Neural Networks
y[0] y[1] y[2] y[3] y[4] x[t]: input vector at time t (e.g., a frame in a video) y[t]: output at time t (e.g., a prediction about the activity in the video) hl: initial hidden state for layer l h3 h2 h1 x[0] x[1] x[2] x[3] x[4]

22 Example: Recurrent Neural Networks
y[0] y[1] y[2] y[3] y[4] for t in range(num_steps): h1 = rnn.first_layer(x[t], h1) h2 = rnn.second_layer(h1, h2) h3 = rnn.third_layer(h2, h3) y = rnn.fourth_layer(h3) h3 h2 h1 x[0] x[1] x[2] x[3] x[4]

23 Example: Recurrent Neural Networks
for t in range(num_steps): h1 = rnn.first_layer(x[t], h1) h2 = rnn.second_layer(h1, h2) h3 = rnn.third_layer(h2, h3) y = rnn.fourth_layer(h3) h3 h2 t = 0 h1 x[0] x[1] x[2]

24 Example: Recurrent Neural Networks
for t in range(num_steps): h1 = rnn.first_layer(x[t], h1) h2 = rnn.second_layer(h1, h2) h3 = rnn.third_layer(h2, h3) y = rnn.fourth_layer(h3) h3 h2 t = 0 h1 x[0] x[1] x[2]

25 Example: Recurrent Neural Networks
for t in range(num_steps): > h1 = rnn.first_layer(x[t], h1) h2 = rnn.second_layer(h1, h2) h3 = rnn.third_layer(h2, h3) y = rnn.fourth_layer(h3) h3 h2 t = 0 h1 x[0] x[1] x[2]

26 Example: Recurrent Neural Networks
for t in range(num_steps): h1 = rnn.first_layer(x[t], h1) > h2 = rnn.second_layer(h1, h2) h3 = rnn.third_layer(h2, h3) y = rnn.fourth_layer(h3) h3 h2 t = 0 h1 x[0] x[1] x[2]

27 Example: Recurrent Neural Networks
for t in range(num_steps): h1 = rnn.first_layer(x[t], h1) h2 = rnn.second_layer(h1, h2) > h3 = rnn.third_layer(h2, h3) y = rnn.fourth_layer(h3) h3 h2 t = 0 h1 x[0] x[1] x[2]

28 Example: Recurrent Neural Networks
y[0] for t in range(num_steps): h1 = rnn.first_layer(x[t], h1) h2 = rnn.second_layer(h1, h2) h3 = rnn.third_layer(h2, h3) > y = rnn.fourth_layer(h3) h3 h2 t = 0 h1 x[0] x[1] x[2]

29 Example: Recurrent Neural Networks
y[0] for t in range(num_steps): > h1 = rnn.first_layer(x[t], h1) h2 = rnn.second_layer(h1, h2) h3 = rnn.third_layer(h2, h3) y = rnn.fourth_layer(h3) h3 h2 t = 1 h1 x[0] x[1] x[2]

30 Example: Recurrent Neural Networks
y[0] for t in range(num_steps): h1 = rnn.first_layer(x[t], h1) > h2 = rnn.second_layer(h1, h2) h3 = rnn.third_layer(h2, h3) y = rnn.fourth_layer(h3) h3 h2 t = 1 h1 x[0] x[1] x[2]

31 Example: Recurrent Neural Networks
y[0] for t in range(num_steps): h1 = rnn.first_layer(x[t], h1) h2 = rnn.second_layer(h1, h2) > h3 = rnn.third_layer(h2, h3) y = rnn.fourth_layer(h3) h3 h2 t = 1 h1 x[0] x[1] x[2]

32 Example: Recurrent Neural Networks
y[0] y[1] for t in range(num_steps): h1 = rnn.first_layer(x[t], h1) h2 = rnn.second_layer(h1, h2) h3 = rnn.third_layer(h2, h3) > y = rnn.fourth_layer(h3) h3 h2 t = 1 h1 x[0] x[1] x[2]

33 Example: Recurrent Neural Networks
y[0] y[1] y[2] y[3] y[4] x[t]: input vector at time t (e.g., a frame in a video) y[t]: output at time t (e.g., a prediction about the activity in the video) hl: initial hidden state for layer l h3 h2 h1 x[0] x[1] x[2] x[3] x[4] blue - task completed red - task running - dependence ready - dependence unready

34 Example: Recurrent Neural Networks
y[0] y[1] y[2] y[3] y[4] x[t]: input vector at time t (e.g., a frame in a video) y[t]: output at time t (e.g., a prediction about the activity in the video) hl: initial hidden state for layer l h3 h2 h1 x[0] x[1] x[2] x[3] x[4] blue - task completed red - task running - dependence ready - dependence unready

35 Example: Recurrent Neural Networks
y[0] y[1] y[2] y[3] y[4] x[t]: input vector at time t (e.g., a frame in a video) y[t]: output at time t (e.g., a prediction about the activity in the video) hl: initial hidden state for layer l h3 h2 h1 x[0] x[1] x[2] x[3] x[4] blue - task completed red - task running - dependence ready - dependence unready

36 Example: Recurrent Neural Networks
y[0] y[1] y[2] y[3] y[4] x[t]: input vector at time t (e.g., a frame in a video) y[t]: output at time t (e.g., a prediction about the activity in the video) hl: initial hidden state for layer l h3 h2 h1 x[0] x[1] x[2] x[3] x[4] blue - task completed red - task running - dependence ready - dependence unready

37 Example: Recurrent Neural Networks
y[0] y[1] y[2] y[3] y[4] x[t]: input vector at time t (e.g., a frame in a video) y[t]: output at time t (e.g., a prediction about the activity in the video) hl: initial hidden state for layer l h3 h2 h1 x[0] x[1] x[2] x[3] x[4] blue - task completed red - task running - dependence ready - dependence unready

38 Example: Recurrent Neural Networks
y[0] y[1] y[2] y[3] y[4] x[t]: input vector at time t (e.g., a frame in a video) y[t]: output at time t (e.g., a prediction about the activity in the video) hl: initial hidden state for layer l h3 h2 h1 x[0] x[1] x[2] x[3] x[4] blue - task completed red - task running - dependence ready - dependence unready

39 Example: Recurrent Neural Networks
y[0] y[1] y[2] y[3] y[4] x[t]: input vector at time t (e.g., a frame in a video) y[t]: output at time t (e.g., a prediction about the activity in the video) hl: initial hidden state for layer l h3 h2 h1 x[0] x[1] x[2] x[3] x[4] blue - task completed red - task running - dependence ready - dependence unready

40 Example: Recurrent Neural Networks
y[0] y[1] y[2] y[3] y[4] x[t]: input vector at time t (e.g., a frame in a video) y[t]: output at time t (e.g., a prediction about the activity in the video) hl: initial hidden state for layer l h3 h2 h1 x[0] x[1] x[2] x[3] x[4] blue - task completed red - task running - dependence ready - dependence unready

41 Example: Recurrent Neural Networks
y[0] y[1] y[2] y[3] y[4] x[t]: input vector at time t (e.g., a frame in a video) y[t]: output at time t (e.g., a prediction about the activity in the video) hl: initial hidden state for layer l h3 h2 h1 x[0] x[1] x[2] x[3] x[4] blue - task completed red - task running - dependence ready - dependence unready

42 How would BPS work? y[0] y[1] y[2] y[3] y[4] h3 h2 h1 x[0] x[1] x[2]

43 How would BPS work? y[0] y[1] y[2] y[3] y[4] BSP assumes all tasks in same stage run same function: Not the case here! h3 h2 h1 x[0] x[1] x[2] x[3] x[4]

44 How would BPS work? y[0] y[1] y[2] y[3] y[4] h3 h2 h1 x[0] x[1] x[2]

45 How would BPS work? y[0] y[1] y[2] y[3] y[4] BSP assumes all tasks in same stage operate only on local data: Not the case here! h3 h2 h1 x[0] x[1] x[2] x[3] x[4]

46 Ray: Fine grained parallel execution engine
Goal: make it easier to parallelize Python programs, in particular ML algorithms add(a, b): return a + b x = add(3, 4) Python @ray.remote add(a, b): return a + b x_id = add.remote(3, 4) x = ray.get(x_id) Ray

47 Another Example import def f(stepsize): # do computation… return result # Run 4 experiments in parallel results = [f.remote(stepsize) for stepsize in [0.001, 0.01, 0.1, 1.0]] # Get the results ray.get(results)

48 Ray Architecture … Nodes System State & Message Bus Object Store
Driver: run a Ray program Worker: execute Python functions (tasks) Object Store: Stores python objects Use shared memory on same node Global scheduler: schedule tasks based on global state Local scheduler: schedule tasks locally System State & Msg Bus: store up-to-date state control state of entire system and relay events between components Nodes System State & Message Bus Object Store Object Table Function Table Object Manager Task Table Driver Driver Driver Driver Worker Driver Local Scheduler Event Table Global Scheduler Global Scheduler Global Scheduler

49 Ray Architecture … Nodes System State & Message Bus Object Store
Object Table Object Store: could evolve into storage for Arrow Backend: could evolve into RISE microkernel Function Table Object Manager Task Table Driver Driver Driver Driver Worker Driver Local Scheduler Event Table Global Scheduler Global Scheduler Global Scheduler

50 Ray System Instantiation & Interaction
Node 1 Node 2 Object Store Object Store put, get Distributed Object Store Driver Worker 1 Worker 2 Object Manager Object Manager submit execute transfer, evict Local Scheduler Local Scheduler submit submit System State & Message Bus (shared, sharded) execute submit Global Scheduler

51 Example N1 N2 N2 Driver Worker System State & Message Bus Object Store
@ray.remote add(a, b): return a + b v_id = ray.put(3) x_id = add.remote( v_id, 4) x = ray.get(x_id) Local Scheduler Local Scheduler Global Scheduler

52 Example N2 N1 Worker Driver System State & Message Bus Object Store
Function Table fun_id add(a, b)… @ray.remote add(a, b): return a + b @ray.remote add(a, b): return a + b v_id = ray.put(3) x_id = add.remote( v_id, 4) x = ray.get(x_id) Local Scheduler Local Scheduler Global Scheduler

53 Example N2 N1 Worker Driver 3 System State & Message Bus Object Store
v_id 3 Worker Function Table fun_id add(a, b)… @ray.remote add(a, b): return a + b @ray.remote add(a, b): return a + b v_id = ray.put(3) x_id = add.remote( v_id, 4) x = ray.get(x_id) v_id N1 Object Table Local Scheduler Local Scheduler Global Scheduler

54 Example N2 N1 Worker Driver 3 System State & Message Bus Object Store
v_id 3 Worker Function Table fun_id add(a, b)… @ray.remote add(a, b): return a + b @ray.remote add(a, b): return a + b v_id = ray.put(3) x_id = add.remote( v_id, 4) x = ray.get(x_id) v_id N1 Object Table task_id fun_id, v_id, 4 Task Table Local Scheduler Local Scheduler Global Scheduler

55 Example N1 N2 Driver 3 3 Worker System State & Message Bus
Object Store Object Store Driver v_id 3 v_id 3 Worker Function Table fun_id add(a, b)… @ray.remote add(a, b): return a + b @ray.remote add(a, b): return a + b v_id = ray.put(3) x_id = add.remote( v_id, 4) x = ray.get(x_id) v_id N1 Object Table task_id fun_id, v_id, 4 Task Table x_id = add.remote(v_id, 4) Local Scheduler Local Scheduler Global Scheduler

56 Example N1 N2 Driver 3 3 Worker System State & Message Bus
Object Store Object Store Driver v_id 3 v_id 3 Worker Function Table fun_id add(a, b)… @ray.remote add(a, b): return a + b @ray.remote add(a, b): return a + b v_id = ray.put(3) x_id = add.remote( v_id, 4) x = ray.get(x_id) v_id N1 Object Table remote() invocation non-blocking task_id fun_id, v_id, 4 Task Table Local Scheduler Local Scheduler Global Scheduler

57 Example N1 N2 Driver 3 3 Worker System State & Message Bus
Object Store Object Store Driver v_id 3 v_id 3 Worker Function Table fun_id add(a, b)… @ray.remote add(a, b): return a + b ray.get() blocks waiting for remote function to finish @ray.remote add(a, b): return a + b v_id = ray.put(3) x_id = add.remote( v_id, 4) x = ray.get(x_id) v_id N1 Object Table task_id fun_id, v_id, 4 Task Table Local Scheduler Local Scheduler Global Scheduler

58 Example N1 N2 Driver 3 3 Worker System State & Message Bus
Object Store Object Store Driver v_id 3 v_id 3 Worker Function Table fun_id add(a, b)… @ray.remote add(a, b): return a + b @ray.remote add(a, b): return a + b v_id = ray.put(3) x_id = add.remote( v_id, 4) x = ray.get(x_id) v_id N1 Object Table task_id fun_id, v_id, 4 Task Table Local Scheduler Local Scheduler Global Scheduler

59 Example N1 N2 Driver 3 3 Worker System State & Message Bus
Object Store Object Store Driver v_id 3 v_id 3 Worker Function Table fun_id add(a, b)… @ray.remote add(a, b): return a + b @ray.remote add(a, b): return a + b v_id = ray.put(3) x_id = add.remote( v_id, 4) x = ray.get(x_id) v_id N1,N2 Object Table task_id fun_id, v_id, 4 Task Table Local Scheduler Local Scheduler Global Scheduler

60 Example N1 N2 Driver 3 3 Worker System State & Message Bus
Object Store Object Store Driver v_id 3 v_id 3 Worker Function Table fun_id add(a, b)… @ray.remote add(a, b): return a + b @ray.remote add(a, b): return a + b v_id = ray.put(3) x_id = add.remote( v_id, 4) x = ray.get(x_id) v_id N1,N2 Object Table add(v_id, 4) task_id fun_id, v_id, 4 Task Table Local Scheduler Local Scheduler Global Scheduler

61 Example N1 N2 Driver 3 3 Worker 7 System State & Message Bus
Object Store Object Store Driver v_id 3 v_id 3 Worker Function Table x_id 7 fun_id add(a, b)… @ray.remote add(a, b): return a + b @ray.remote add(a, b): return a + b v_id = ray.put(3) x_id = add.remote( v_id, 4) x = ray.get(x_id) v_id N1,N2 Object Table x_id N2 task_id fun_id, v_id, 4 Task Table Local Scheduler Local Scheduler Global Scheduler

62 Example N1 N2 Driver 3 3 Worker 7 7 System State & Message Bus
Object Store Object Store Driver v_id 3 v_id 3 Worker Function Table x_id 7 x_id 7 fun_id add(a, b)… @ray.remote add(a, b): return a + b @ray.remote add(a, b): return a + b v_id = ray.put(3) x_id = add.remote( v_id, 4) x = ray.get(x_id) v_id N1,N2 Object Table x_id x_id N2,N1 N2 task_id fun_id, v_id, 4 Task Table Local Scheduler Local Scheduler Global Scheduler

63 Example N1 N2 Driver 3 3 Worker 7 7 System State & Message Bus x = 7
Object Store Object Store Driver v_id 3 v_id 3 Worker Function Table x_id 7 x_id 7 fun_id add(a, b)… @ray.remote add(a, b): return a + b @ray.remote add(a, b): return a + b v_id = ray.put(3) x_id = add.remote( v_id, 4) x = ray.get(x_id) v_id N1,N2 Object Table x = 7 DONE! x_id N2,N1 task_id fun_id, v_id, 4 Task Table Local Scheduler Local Scheduler Global Scheduler

64 Project & Exam Dates Wednesday, 9/7: google doc with project suggestions Include other topics, such as graph streaming Monday, 9/19: pick a partner and send your project proposal I’ll send a google form to fill in for your project proposals Monday, 10/12: project progress review More details to follow Wednesday, 10/5: Midterm exam


Download ppt "E2E Arguments & Project Suggestions (Lecture 4, cs262a)"

Similar presentations


Ads by Google