Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scalable Distributed Stream Processing Presented by Ming Jiang.

Similar presentations


Presentation on theme: "Scalable Distributed Stream Processing Presented by Ming Jiang."— Presentation transcript:

1 Scalable Distributed Stream Processing Presented by Ming Jiang

2 Centralized stream processing review

3 Situation when distributed A distributed federation of participating nodes in different administrative domains Collaboration between different domains required

4 Two complementary efforts for the situation Aurora* intra-participant distribution Medusa inter-participant distribution

5 Three pieces to be shard Aurora An overlay network of communication Algorithms for high-availability

6 Three architectural issues Communications Load sharing High availability in the presence of failure

7 Communications Naming (participants, entity-name) Routing 1. a data source or an administrator registers a schema and a stream 2. When DS produce an event, labels

8 Communications Message Transport multiplexing all the message streams on a single TCP connection Remote definition: process migration is too complicated

9 Load Management Repartitioning Aurora Networks, based on loads and resources: Box Sliding Box Splitting

10 Box Sliding Takes a box on the edge of a sub- network on one machine and shifts it to its neighbor. upstream box sliding

11 Box Splitting Create a copy of a box that is intended to run on second machine, to offload Need a filter as router

12 Box splitting Tumble Merge: Box splitting has to be transparent

13 Box splitting If predicate in filter is: B<3  A machine: 1,2,3,4,7B machine: 5,6 A machine B machine final result after merge

14 Key partitioning Challenges Choosing what to offload Choosing what to split Choosing filters Others…

15 High Availability Utilize the push-based nature

16 Failure detection and Recovery 1. periodically send heartbeat msgs to upstream neighbors 2. if any server does not reply for pre-defined time, we assume it failed 3. initiate recovery phase, emulating the process of failed server (load shedding can be used)

17 Thank you!


Download ppt "Scalable Distributed Stream Processing Presented by Ming Jiang."

Similar presentations


Ads by Google