Presentation is loading. Please wait.

Presentation is loading. Please wait.

Corona: A High Performance Publish-Subscribe System for the World Wide Web Authors: V. Ramasubramanian, R. Peterson and E.G. Sirer Cornell University Presenter:

Similar presentations


Presentation on theme: "Corona: A High Performance Publish-Subscribe System for the World Wide Web Authors: V. Ramasubramanian, R. Peterson and E.G. Sirer Cornell University Presenter:"— Presentation transcript:

1 Corona: A High Performance Publish-Subscribe System for the World Wide Web Authors: V. Ramasubramanian, R. Peterson and E.G. Sirer Cornell University Presenter: Sara Salahi Northwestern University CorONA: Cornell Online News Aggregator

2 Motivation Abundance of frequently changing information on the Web: –Weblogs, wikis, news sites etc. Increased need to notify users of updates –Ideally want: Fast update detection Optimal bandwidth utilization –Existing protocols do not provide users with automatic notification of updates

3 Background Publish-Subscribe Systems –Publishers, subscribers and infrastructure –Topic based vs. Content based –Fundamental drawbacks of preceding systems: Require substantial changes in the way publishers serve content Expect subscribers to learn sophisticated query languages Non-compatible with current Web architecture

4 Background Micronews Systems –Micronews feeds: short descriptions of frequently updated information in XML-based formats (e.g. RSS) –Feed readers, cloud tag (pub-sub model) –Commercial services disseminate micronews updates to users –Main disadvantages: Fragile centralized servers Relentless polling to detect updates –Corona Improvements: Shares updates between peers Cooperative polling reduces update latencies

5 Background Overlay Networks –Large number of structured overlays that organize networks Rings, hyperdimensional cubes, butterfly structures, de Bruijn graphs, skip-lists etc. –Corona is easily layered on structured overlays with uniform node degree (includes all of the above listed overlays)

6 Corona: The Big Picture Topic based pub-sub system which interoperates with current Web architecture (URLs = “channels”) Cooperative polling of channels by geographically distributed nodes –“…n nodes polling with same polling interval and randomly distributed polling times can detect updates n times faster if they share updates with each other.” –Optimization problem Tradeoff between update performance and network load

7 Analytical Modeling Pastry: underlying substrate, organizes network into a ring –Routing table, DAG rooted at each node –Node can reach another node in log b N hops, b: fanout, N: # of nodes Corona assigns nodes in well-defined wedges –Optimal wedge size determined by analysis of global performance overhead tradeoff

8 Analytical Modeling Channel with polling level L –Polled by nodes with at least L matching prefix digits in their identifiers (polling level 0: all nodes in system poll for the channel) Polling level quantifies performance-overhead tradeoff –Channel with polling level L has: N/b L nodes polling itτ: polling interval Cooperatively detects updates in (τ/2)(b L /N) time on average Collective load placed on server of the channel is τ(N/b L )

9 Analytical Models Corona Lite –Minimize average update detection time –Bound load placed on content servers –Overall update performance = average of the update detection time of each channel weighted by # of clients subscribed to the channels –Target network load - the total # of subscriptions in the system Corona Fast –Achieve target average update detection time –Minimize load placed on content servers –Maintains stable performance through changes in workload Corona Fair –Minimize average update detection time w.r.t. expected update frequency –Bound load on content servers –Incorporates update rate of channels into tradeoff to achieve a fairer distribution of update performance between channels –Defines a modified update performance metric as the ratio of the update detection time and the polling interval of the channel

10 Decentralized Optimization Honeycomb – determines optimal polling levels f i (l) and g i (l) define performance & cost for channel i as function of polling level l NP-Hard so approximate solution –Lagrange multiplier: –Due to monotonicity, optimal solution L* is bounded by same minima as approximated solutions L d * and L u * Honeycomb aggregates global tradeoff factors –Channels grouped in tradeoff clusters, f i /g i –# clusters/polling level is limited by a constant (Tradeoff_Bins) –Cluster aggregation overhead (memory state, network bandwidth) limited by size of routing table

11 System Management Channel has unique identifier and one or more owner nodes managing it –Primary owner is Corona node with numerically closest identifier to channel’s identifier –Additional owners are F closest neighbors Tolerate failures –Like all P2P systems, problem occurs if more than F adjacent nodes fail at once Fixed because users can easily re-subscribe Owners inform subscribers of updates and keep track of channel-specific factors that affect performance tradeoffs

12 System Management Cooperative Polling –Optimization Phase Corona nodes apply optimization algorithm on tradeoff data –Maintenance Phase Changes to polling levels communicated to peer nodes in routing table via maintenance messages –Aggregation Phase Enables nodes to receive new aggregates of tradeoff factors Polls for a channel at different nodes are randomly distributed over time

13 Update Dissemination Version numbers Deltas –Studies show that amount of change in content update is typically tiny – 6.8% –Difference engine used to identify new information –When delta is generated by a node, all other nodes in channel’s polling wedge are updated “Simultaneously” detected deltas –Primary owner makes sure latest delta is used and ignores redundant deltas

14 User-Interface http://www.cs.cornell.edu/people/egs/beehive/corona/

15 Implementation Layered on Pastry Corona handles orphan channels –Tradeoff factors are aggregated into slack cluster prior to optimization Reliance on IM –Can’t log in from all nodes simultaneously Prevent malicious nodes from generating spurious updates –Publish digitally signed certificates –Use threshold-cryptography to generate certificate for content

16 Evaluation Compare Corona performance against legacy RSS performance Real-life RSS traces are used The tradeoff parameters are extrapolated to a larger scale: –1024 nodes –100,000 channels –5,000,000 subscribers Polling interval – 30 minutes

17 Evaluation Network load on Content Servers Average Update Detection Time Number of Pollers per Channel Update Detection Time per Channel

18 Evaluation Update Detection Time per Channel OVERALL SUMMARY

19 Deployment A set of 60 PlanetLab nodes Corona-Lite scheme is used 7500 RSS feeds from www.syndic8.comwww.syndic8.com 150,000 subscriptions Polling interval – 30 minutes

20 Deployment Results Average Update Detection Time Total Polling Load on Servers

21 Conclusions/Future Work Corona is a topic based pub-sub system which interoperates with current Web architecture, network overlays Fast update detection time achieved by: –Cooperative polling of channels by geographically distributed nodes –Shared updates between peers Do all updates need to be shared? Measure average time to deliver updates to subscribers? Maybe optimize polling interval time depending on rate of updates in channel? Need to run better simulation with IM interface to see true overhead of having multiple nodes logged in at once

22 Thank you!


Download ppt "Corona: A High Performance Publish-Subscribe System for the World Wide Web Authors: V. Ramasubramanian, R. Peterson and E.G. Sirer Cornell University Presenter:"

Similar presentations


Ads by Google