Navneet Kumar Pandey1 Stéphane Weiss1 Roman Vitenberg1

Navneet Kumar Pandey1 Stéphane Weiss1 Roman Vitenberg1
Distributed event aggregation for content-based Publish/Subscribe systems Navneet Kumar Pandey1 Stéphane Weiss1 Roman Vitenberg1 Kaiwen Zhang2 Hans-Arno Jacobsen2 1University of Oslo 2University of Toronto

Motivation: Intelligent Transport System (ITS)
Information providers: road sensors, crowdsourced mobile apps Information seekers: commuters, police, first responders, radio networks etc. Aggregate subscriptions Count number of cars passing a street light per hour Average speed of cars on a road segment per day Non-aggregate subscriptions Accident reports Traffic violation reports

Aggregation in pub/sub
Pub/sub is well known for efficient content filtering and dissemination for distributed event sources and sinks. However, pub/sub does not support aggregation, which is required in emerging applications. Our primary objective is to retain the traditional pub/sub focus on low communication cost, while adding support for aggregation.

Contributions: aggregation in pub/sub
We propose a framework and baseline approaches for aggregation in content-based pub/sub systems (CBPS). We show how the relative performance of the baseline approaches varies with workload properties. We propose a per-broker distributed adaptive approach.

Advertisement-based pub/sub model
Broker P[val,8] A[val, > ,4] Subscription Delivery Tree (SDT) Bp Subscriber Publishers BI BI BS S[val, > ,3] Bq

Comparison with stream processing
Aggregation in stream processing Aggregation in pub/sub Requires global view of topology Topology is not known to individual broker nodes Requires a priori knowledge of publication sources Publication sources and sinks are dynamic Needs control layer Brokers are loosely coupled Usually have a static query plan SDTs are dynamic and determined by the pub/sub implementation Optimized for continuous data streams Publications come at an irregular rate

Proposed aggregation framework
Publication filtering procedure (PFP) Subscription: { RoadID = 101, speed > 10, op=‘avg’ , Duration (ω) = 2 hour, shift size (δ) = 1 hour} NWR3 NWR1 NWR2 subscription 1 2 3 Time Notification window ranges (NWR) Pub1 Pub2 Pub3 A single publication can participate in several NWRs, even for the same subscription.

Publication filtering procedure (PFP) Initial computation procedure (ICP) Pub1 Pub2 Pub3 NWR1 subscription NWR2 x NWR3 1 2 3 Time Notification window ranges (NWR) Outgoing messages: { avg(Pub1, Pub2, Pub3), avg(Pub2, Pub3) } Outgoing messages: { avg(Pub1, Pub2), avg(Pub2), Pub3 } Processing start time presents a trade-off between communication cost and end-to-end delay.

Publication filtering procedure (PFP) Initial computation procedure (ICP) Recurrent processing procedure (RPP) avgp Collection delay Bp BI avgq Bq avgpq Collection delay is another parameter affecting the delay-communication trade-off.

Late aggregation approach
PFS ICP X RPP P[val,3] P[val,5] Messages exchanged in Late aggregation: 6 Bp Subscriber Publishers BI BS Bs X Smin[val,>,2] Bq P[val,9] P[Valmin,3] P[val,2] Late approach aggregates messages at subscriber-edge brokers. 10

Early aggregation approach
PFS ICP X RPP P[val,3] P[val,5] Messages exchanged in Late aggregation: 6 Bp BA Messages exchanged in Early aggregation: 3 P[valmin,3] Subscriber X Publishers BI BS P[valmin,3] X Smin[val,>,2] Bq P[valmin,9] P[val,9] P[valmin,3] P[val,2] Early approach aggregates messages at publisher-edge brokers. 11

Early does not always outperform Late
P[val,3] P[val,5] P[valmin,3] P[valmax,5] Bp P[valcount,3] P[valcount,2] P[valmax,9] Smin[val,>,2] P[valmin,3] BI BS Smax[val,>,2] Scount[val,>,2] P[valcount,1] Bq P[valmax,9] P[valmin,9] P[val,9] P[val,2] Late aggregation Messages exchanged: 6 Early aggregation Messages exchanged: 9 12

Comparison between Early and Late
Several parameters affect the performance of our baselines: Increasing parameter Favors Publication matching rate Early Matching number of NWRs Late Overlap among aggregate subscriptions Ratio between aggregate and regular subscriptions Reducing the communication cost requires an adaptive solution

Benefits of adaptive aggregation
Late 6 Early 5 P[val,3] P[val,5] BA Bp P[valmin,3] P[valmin,3] Smin[val,>,2] P[val,9] BA BI BS BA S[val,>,6] BF Bq P[valmin,9] P[val,9] P[val,9] P[val,2] 14

Benefits of adaptive aggregation
Late 6 Early 5 Adaptive 4 Adaptive P[val,3] P[val,5] BA Bp P[valmin,3] P[valmin,3] Smin[val,>,2] P[val,9] BI BI BA BS S[val,>,6] Bq Bq P[val,9] P[val,9] P[val,2] Per-broker adaptation reduces communication cost 15

Adaptation process (MAPE-K)
Analyze Plan Compare the ratio between Pubs vs. NWRs Estimate the notification rate Choose the suitable mode Transition between aggregate and forward mode Knowledge Information at a broker Registered subscriptions Current execution mode Monitor Execute Matching publications within sampling period Changes in subscription set Start/stop aggregation at broker General framework with a parametric cost model

Experimental setup Implemented in Java over the PADRES framework
Topology: 16 brokers Combination of publisher-edge only, subscriber-edge only and mixed brokers Real life datasets: Traffic dataset from the ONE-ITS service1 Yahoo! Finance Stock dataset Metrics: Number of messages exchanged Processing overhead End-to-end delay B ONE 1http://one-its-webapp1.transport.utoronto.ca

Results (Stock dataset)
Decision becomes more accurate when available information is sufficient Varying Publication/second Varying number of subscriptions Early perform better at high pub rates whereas Late is better with large number of subscriptions. Adaptive aggregation performs close to the best among Early and Late for all settings.

Results (Traffic dataset)
Varying Publication/second Varying number of subscriptions Per-Broker adaptation can cause individual brokers to make incorrect decisions

Processing overhead (Stock)
Predicate matching cost Aggregation-related overhead Adaptation overhead is dominating the aggregation overhead

Conclusions We provide an aggregation framework for CBPS with baseline solutions. We demonstrate that neither baseline is dominant and depends upon workload parameters. We provide a generic adaptive aggregation framework. We experimentally demonstrate that our distributed adaptive solution performs close to the best baseline across all settings.

For questions and comments Contact: navneet@ifi.uio.no
Thank you! For questions and comments Contact:

Navneet Kumar Pandey1 Stéphane Weiss1 Roman Vitenberg1

Similar presentations

Presentation on theme: "Navneet Kumar Pandey1 Stéphane Weiss1 Roman Vitenberg1"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Navneet Kumar Pandey1 Stéphane Weiss1 Roman Vitenberg1

Similar presentations

Presentation on theme: "Navneet Kumar Pandey1 Stéphane Weiss1 Roman Vitenberg1"— Presentation transcript:

Similar presentations

About project

Feedback