Presentation is loading. Please wait.

Presentation is loading. Please wait.

Providing Resiliency to Load Variations in Distributed Stream Processing Ying Xing, Jeong-Hyon Hwang, Ugur Cetintemel, Stan Zdonik Brown University.

Similar presentations


Presentation on theme: "Providing Resiliency to Load Variations in Distributed Stream Processing Ying Xing, Jeong-Hyon Hwang, Ugur Cetintemel, Stan Zdonik Brown University."— Presentation transcript:

1 Providing Resiliency to Load Variations in Distributed Stream Processing Ying Xing, Jeong-Hyon Hwang, Ugur Cetintemel, Stan Zdonik Brown University

2 Jeong-Hyon Hwang (jhhwang@cs.brown.edu) 2 Stream Processing Monitoring Apps Financial Data Streams Surveillance Network Monitoring Click Stream Analysis Traffic Monitoring Sensor Network

3 Jeong-Hyon Hwang (jhhwang@cs.brown.edu) 3 Distributed Stream Processing

4 Jeong-Hyon Hwang (jhhwang@cs.brown.edu) 4 Roadmap Problem Statement Linear Load Model Feasible Set The Algorithm Extensions Lower Bound of Input Rates Non-linear Load Model Network Bandwidth / Communication Overhead Experimental Results Related Work Conclusions

5 Jeong-Hyon Hwang (jhhwang@cs.brown.edu) 5 Problem Statement Goal Find an operator distribution with the largest feasible set size r1r1 r2r2 r1r1 r1r1 r2r2 r1r1 r2r2 Input Rate Space Operator Distribution feasible infeasible Feasible Set

6 Jeong-Hyon Hwang (jhhwang@cs.brown.edu) 6 Linear Load Model r j - input rate of input j (tuples/sec) c k - processing cost of operator o k (CPU cycles/tuple) l(o k ) - the processing load of operator o k (CPU cycles/sec) s k - selectivity of operator o k ( [# output tuples] / [# of input tuples] ) o1o1 o1o1 o3o3 o3o3 o2o2 o2o2 o4o4 o4o4

7 Jeong-Hyon Hwang (jhhwang@cs.brown.edu) 7 Example Feasible Sets o1o1 o1o1 o3o3 o3o3 o2o2 o2o2 o4o4 o4o4 r1r1 r2r2 0 o1o1 o1o1 o4o4 o4o4 o2o2 o2o2 o3o3 o3o3 r1r1 r2r2 0 o1o1 o1o1 o3o3 o3o3 o2o2 o2o2 o4o4 o4o4 r1r1 r2r2 0

8 Jeong-Hyon Hwang (jhhwang@cs.brown.edu) 8 “Ideal” Feasible Set Theorem 1. Feasible Set is maximized when load coefficients of each input are perfectly balanced over all nodes (relative to their capacities) o1o1 o1o1 o3o3 o3o3 o2o2 o2o2 o4o4 o4o4 r1r1 r2r2 0 r1r1 r2r2 0

9 Jeong-Hyon Hwang (jhhwang@cs.brown.edu) 9 Resilient Operator Distribution Algorithm 1. Compute the Ideal Feasible Set 2. Sort Operators based on Load Coefficients 3. For each operator, determine the destination server r2r2 0 r1r1 Ideal Feasible Set

10 Jeong-Hyon Hwang (jhhwang@cs.brown.edu) 10 Result: R.O.D. vs Load Balancing 10 nodes 5 input streams

11 Jeong-Hyon Hwang (jhhwang@cs.brown.edu) 11 Result: Latency of a Network Monitoring Query

12 Jeong-Hyon Hwang (jhhwang@cs.brown.edu) 12 Extension: Network Bandwidth & Comm. Overhead Network Bandwidth Comm. Overhead

13 Jeong-Hyon Hwang (jhhwang@cs.brown.edu) 13 Extension: Nonlinear Load Model Add an artificial variable … r1r1 … o1o1 o1o1 ouou ouou o u+1 omom omom … r1r1 o1o1 o1o1 ouou ouou r2r2 … omom omom r2r2

14 Jeong-Hyon Hwang (jhhwang@cs.brown.edu) 14 Extension: Lower Bound of Input Rates Use the lower bound instead of the origin 0 r1r1 r2r2 0 r1r1 r2r2

15 Jeong-Hyon Hwang (jhhwang@cs.brown.edu) 15 Related Work Traditional Distributed Systems - Load balancing and load sharing [Shivaratri92] [Diekmann97] - Parallel query processing [DeWitt92] - Graph partitioning [Walshaw97] [Schloegel00] Stream Processing Systems - Load management Flux [Shah03] – data partitioning based parallel continuous query processing Medusa [Balazinska04] – federated distributed stream processing

16 Jeong-Hyon Hwang (jhhwang@cs.brown.edu) 16 Conclusion Distributed Stream Processing Resilient Operator Distribution - Maximize feasible set size Performance - Much better than conventional load distribution algorithms

17 Backup Slides

18 Computation Complexity Computation time is determined by n – number of nodes m –number of operators d –number of system input streams k – number of samples in load time series Static operator distribution Dynamic operator distribution

19 Jeong-Hyon Hwang (jhhwang@cs.brown.edu) 19 Heuristics Heuristic #1 Choose the case where feasibility boundaries are close on each axis Heuristic #2 Choose the case where all the feasibility boundaries are far from the orgin. r1r1 r2r2 0 r1r1 r2r2 0 r1r1 r2r2 0 r1r1 r2r2 0

20 Resilient vs. Optimal 2 nodes 4 input streams

21 Varying Bandwidth Constraints Resilient vs. Connected-Load-Balancing

22 Varying Data Communication CPU Overhead Resilient vs. Connected-Load-Balancing


Download ppt "Providing Resiliency to Load Variations in Distributed Stream Processing Ying Xing, Jeong-Hyon Hwang, Ugur Cetintemel, Stan Zdonik Brown University."

Similar presentations


Ads by Google