Presentation is loading. Please wait.

Presentation is loading. Please wait.

DISTRIBUTED EVENT AGGREGATION FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS Navneet Kumar Pandey 1 Stéphane Weiss 1 Roman Vitenberg 1 Kaiwen Zhang 2 Hans-Arno.

Similar presentations


Presentation on theme: "DISTRIBUTED EVENT AGGREGATION FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS Navneet Kumar Pandey 1 Stéphane Weiss 1 Roman Vitenberg 1 Kaiwen Zhang 2 Hans-Arno."— Presentation transcript:

1 DISTRIBUTED EVENT AGGREGATION FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS Navneet Kumar Pandey 1 Stéphane Weiss 1 Roman Vitenberg 1 Kaiwen Zhang 2 Hans-Arno Jacobsen 2 2 University of Toronto 1 University of Oslo

2 Motivation: Intelligent Transport System (ITS) Information providers: road sensors, crowdsourced mobile apps Information seekers: commuters, police, first responders, radio networks etc. 2 http://www.wired.com/images_blogs/autopia/2012/08/12A914.jpg Aggregate subscriptions Count number of cars passing a street light per hour Average speed of cars on a road segment per day Non-aggregate subscriptions Accident reports Traffic violation reports

3 Aggregation in pub/sub 3 Pub/sub is well known for efficient content filtering and dissemination for distributed event sources and sinks. However, pub/sub does not support aggregation, which is required in emerging applications. Our primary objective is to retain the traditional pub/sub focus on low communication cost, while adding support for aggregation.

4 Contributions: aggregation in pub/sub 4 We propose a framework and baseline approaches for aggregation in content-based pub/sub systems (CBPS). We show how the relative performance of the baseline approaches varies with workload properties. We propose a per-broker distributed adaptive approach.

5 BIBI P[val,8] A[val, >,4] S[val, >,3] BpBp BqBq BSBS BIBI B Broker Subscription Delivery Tree (SDT) Advertisement-based pub/sub model 5

6 Comparison with stream processing 6 Aggregation in stream processingAggregation in pub/sub Requires global view of topologyTopology is not known to individual broker nodes Requires a priori knowledge of publication sources Publication sources and sinks are dynamic Needs control layerBrokers are loosely coupled Usually have a static query planSDTs are dynamic and determined by the pub/sub implementation Optimized for continuous data streams Publications come at an irregular rate

7 Proposed aggregation framework 7 Publication filtering procedure (PFP) Subscription: { RoadID = 101, speed > 10, op=‘avg’, Duration (ω) = 2 hour, shift size (δ) = 1 hour} NWR 3 NWR 1 NWR 2 subscription 1230Time Notification window ranges (NWR) Pub 1 Pub 2 Pub 3 A single publication can participate in several NWRs, even for the same subscription.

8 Proposed aggregation framework 8 Initial computation procedure (ICP) Publication filtering procedure (PFP) Outgoing messages: { avg(Pub 1, Pub 2, Pub 3 ), avg(Pub 2, Pub 3 ) } Outgoing messages: { avg(Pub 1, Pub 2 ), avg(Pub 2 ), Pub 3 } NWR 3 NWR 1 NWR 2 subscription 1230Time Notification window ranges (NWR) Pub 1 Pub 2 Pub 3 x Processing start time presents a trade-off between communication cost and end-to-end delay.

9 Proposed aggregation framework 9 Initial computation procedure (ICP) Publication filtering procedure (PFP) Recurrent processing procedure (RPP) BpBp BIBI BqBq Collection delay avg p avg q avg pq Collection delay is another parameter affecting the delay-communication trade-off.

10 Late aggregation approach 10 BpBp BqBq BsBs P[val,9] P[val,2] P[val,5] P[val,3] S min [val,>,2] P[Val min,3] Messages exchanged in Late aggregation: 6 X X X PFSICP RPP BSBS BIBI Late approach aggregates messages at subscriber-edge brokers.

11 Early aggregation approach 11 BABA BIBI P[val,9] P[val,2] P[val,5] P[val,3] S min [val,>,2] P[val min,9] P[val min,3] P[val min,3] P[val min,3] PFSICP RPP X X X X X X Messages exchanged in Early aggregation: 3 BpBp BqBq BSBS Messages exchanged in Late aggregation: 6 Early approach aggregates messages at publisher-edge brokers.

12 Early does not always outperform Late 12 BIBI P[val,9] P[val,2] P[val,5] P[val,3] S max [val,>,2] Late aggregation Messages exchanged: 6 S count [val,>,2] S min [val,>,2] P[val max,5] P[val min,3] P[val count,2] Early aggregation Messages exchanged: 9 12 BpBp BqBq BSBS P[val max,9] P[val min,9] P[val count,1] P[val max,9] P[val min,3] P[val count,3]

13 Comparison between Early and Late 13 Reducing the communication cost requires an adaptive solution Increasing parameterFavors Publication matching rateEarly Matching number of NWRsLate Overlap among aggregate subscriptionsLate Ratio between aggregate and regular subscriptionsEarly Several parameters affect the performance of our baselines:

14 Benefits of adaptive aggregation 14 BABA P[val,9] P[val,2] P[val,5] P[val,3] S[val,>,6] S min [val,>,2] P[val min,3] 14 BABA BABA P[val,9] P[val min,3] Late 6 BFBF Early 5 BpBp BqBq BIBI BSBS P[val min,9]

15 Benefits of adaptive aggregation 15 P[val,9] P[val,2] P[val,5] P[val,3] S[val,>,6] S min [val,>,2] P[val min,3] 15 BABA BABA P[val,9] P[val min,3] Late 6 BqBq Per-broker adaptation reduces communication cost Early 5 Adaptive 4 BpBp BqBq BSBS BIBI BIBI

16 Adaptation process (MAPE-K) 16 Matching publications within sampling period Changes in subscription set Compare the ratio between Pubs vs. NWRs Estimate the notification rate Choose the suitable mode Transition between aggregate and forward mode Start/stop aggregation at broker Monitor Analyze Plan Execute Information at a broker Registered subscriptions Current execution mode Knowledge

17 Experimental setup Implemented in Java over the PADRES framework Topology: 16 brokers – Combination of publisher-edge only, subscriber-edge only and mixed brokers Real life datasets: Traffic dataset from the ONE-ITS service 1 Yahoo! Finance Stock dataset Metrics: Number of messages exchanged Processing overhead End-to-end delay 17 B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B 1 http://one-its-webapp1.transport.utoronto.ca

18 Results (Stock dataset) 18 Varying Publication/secondVarying number of subscriptions Decision becomes more accurate when available information is sufficient Adaptive aggregation performs close to the best among Early and Late for all settings. Early perform better at high pub rates whereas Late is better with large number of subscriptions.

19 Results (Traffic dataset) 19 Varying Publication/secondVarying number of subscriptions Per-Broker adaptation can cause individual brokers to make incorrect decisions

20 Processing overhead (Stock) 20 Predicate matching cost Aggregation-related overhead Adaptation overhead is dominating the aggregation overhead

21 Conclusions 21 We provide an aggregation framework for CBPS with baseline solutions. We demonstrate that neither baseline is dominant and depends upon workload parameters. We provide a generic adaptive aggregation framework. We experimentally demonstrate that our distributed adaptive solution performs close to the best baseline across all settings.

22 Thank you! For questions and comments Contact: navneet@ifi.uio.no 22

23 Motivation: stock market application 23 http://opinion-forum.com/index/wp-content/uploads/2012/08/stock_market.jpg Information providers: stock exchanges Information seekers: brokers, buyers Non-aggregate subscriptions: Stock value updates Aggregate subscriptions: Stock market indicators (eg. MACD)

24 Aggregation semantics Window parameters – Window shift size (δ) – Duration (ω) Example – Sliding window: Moving average of the number of cars passing a street light per hour. – Tumbling window: Average speed of cars on a road segment. – Hoping window: Number of cars crossing during rush hour. 24 ω = 2 hour, δ = 1 hour, ω δ ω = δ = 2 hour, ω δ ω = 2 hour, δ = 24 hour, ω δ

25 Challenges of adaptive deployment Data flow is hard to predict: Irregular event rates at the publishers Dynamic number of subscriptions Coupled with dynamic content matching Brokers function autonomously Compatible solution: Congruent to Pub/Sub routing standards Minimum impact over QoS for regular publications 25

26 Other experiments End to end delay Sensitivity towards sampling period Sensitivity towards Collection delay 26 please refer our full paper.

27 Sensitivity analysis: Collection delay 27 Increasing collection time reduces the number of messages but delays the delivery of result.

28 Publication process flow 28 Timestamp publication if not Matched for aggregatio n Is broker aggregating ? Any regular subscriptio n matched? Enqueue for aggregation computation Send Tag as aggregated No Yes No

29 Aggregation Basics Notification window Ranges 29 PublicationMatching NWR 2 1 NWR 2 2 NWR 2 3 NWR 2 4 sub 2 NWR 3 1 NWR 3 2 NWR 3 3 sub 1 NWR 3 3 NWR 3 1 NWR 3 2 sub 3 12345670 NWR 3 4 Time Sliding Window Tumbling Window Sampling Window

30 Motivation Pub/Sub is well known for efficient content filtering and dissemination for distributed event source and syncs. Content-based Pub/Sub does not supports time-based aggregation. 30

31 Pub/Sub systems :- a popular communication paradigm Researches in Pub/sub have traditionally focused on performance than extending functionality. 31 Business process[4] work- flow management[5] stock- market monitoring[3] social interaction[2] network monitoring and management[6] RSS filtering[1]

32 Event distribution systems such as ITS demand aggregation filters Moving average of the number of cars passing a street light per hour. Average speed of cars on a road segment. Number of cars crossing a highway during rush hour. 32

33 Scope of our solution Acyclic overlay Broker federated Pub/Sub Advertisement based forwarding model Time based aggregation 33


Download ppt "DISTRIBUTED EVENT AGGREGATION FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS Navneet Kumar Pandey 1 Stéphane Weiss 1 Roman Vitenberg 1 Kaiwen Zhang 2 Hans-Arno."

Similar presentations


Ads by Google