Presentation is loading. Please wait.

Presentation is loading. Please wait.

Gil EinzigerRoy Friedman Computer Science Department Technion.

Similar presentations


Presentation on theme: "Gil EinzigerRoy Friedman Computer Science Department Technion."— Presentation transcript:

1 Gil EinzigerRoy Friedman Computer Science Department Technion

2 Background: Publish/Subscribe Publisher: any entity that wishes to publish some event Look at my new hairstyle

3 Background: Publish/Subscribe Subscriber: any entity that wishes to be notified about events that match its interests (also called subscription) I want to know everything about hairstyles I want to know everything about Ariana Grande I only care about science fiction

4 Background: Publish/Subscribe The system’s goal is to deliver notifications about events to interested subscribers and only to them – Decoupling of information producers from consumers Applications: – Social networking (Twitter and the likes) – Stock quotes – Control systems – Data-center management – Etc.

5 Background: P2P Decentralized systems in which (most of) the communication is performed directly between the end nodes of the system – the peers Often, peers are donated users’ machines – But can also be set-top boxes, routers, a large datacenter’s servers, etc. Famous example applications: – Skype, Bittorrent (and other file sharing), IPTV, Bitcoin

6 It’s a Brave New World Out There Most users access online content through their mobiles – Intermittent connectivity – Limited bandwidth – Limited battery life – Limited resources We need to decouple the devices used to access the data and the ones serving the P2P network

7 A Vision for Future P2P Solutions Whether ran as a true P2P network between donated machines or inside a data-center: – Decoupling between devices that consume services and the ones providing the service Incentives might be in the form of revenue share with advertisers or paid subscribers – P2P machines are used for providing multiple services Not feasible to optimize the P2P overlay for a specific service

8 Problem Statement A scalable and efficient pub/sub system for self- sustained P2P networks – The challenge Subscribers might not be present much of the time Short client sessions Use the existing overlay – Quality Goals High delivery rate Efficient publication delivery Reasonable latency High churn resilience

9 Our Solution: Overview

10 Our Solution: Subscribing to Mailbox Clients that are aware of mailboxes serving their topics, simply notify these mailboxes about it A client that is unaware of such mailboxes, initiates multiple biased random walks – Each mailbox distributes a hint (Bloom filter) with the topics it subscribes to its overlay neighbors up to some distance – The random walks favor visiting nodes whose hints include a match – The random walks continue for a given TTL trying to find as many matching mailbox as possible If none is found, then the home node becomes a mailbox for these topics

11 Our Solution: Mailboxes An overloaded mailbox can refuse to accept new clients and topics Mailboxes disappear naturally due to churn or when they are underutilized The important objective is load sharing rather than load balancing

12 Our Solution: Dissemination Spanning tree among mailboxes that know each other Random walks to discover new mailboxes and disseminate to them

13 Our Solution: Dissemination Spanning tree among mailboxes that know each other Random walks to discover new mailboxes and disseminate to them

14 Our Solution: Dissemination Spanning tree + random walks between mailboxes Normally, a mailbox pushes events to corresponding registered clients Additionally, out-of-band gossip between mailboxes and clients – Clients poll their set of known mailboxes – Exchange list of known events with each polled mailbox – Occurs periodically plus after re-connection

15 Implementation – Written in Java – Open source project All code including testing available online – Can be run on top of real IP networks as well as the PeerSim simulator In the real networks case, executed on top of the OpenKAD implementation of the Kademlia DHT – Measurements confirm similarity between results with similar size networks Simulations can be used to explore scalability Real networks can be used to validate simulation results

16 Evaluation: Methodology Traces: – Synthetic traces Subscriptions are spread to clients/home nodes uniformly Topic publication distribution is Zipf-like with α=0.9 – Twitter traces Metrics: – Delivery rate – Communication load – Mailbox subscription pattern vs. users’ – Effects of churn

17 Results: Synthetic Workload Time (minutes) Delivery Rate Delivery rate over time vs. network size 1) Delivery rate approachs 100% after a few minutes 2) For 1500 nodes, simulation ~ real runs

18 Results: Twitter Traces Time (minutes) Delivery Rate Delivery rate over time

19 Results: Load Distribution # node Total Handled Messages Load Distribution (Twitter) 1)Almost all load goes to mailboxes only 2)Even most loaded need to handle fewer than 10 messages per second

20 Results: Subscription Pattern # node Subscription Number Twitter (Feb. 7 2010 19:00-20:20) Subscription pattern of mailboxes much more uniform than of clients => balanced dissemination trees

21 Results: Subscription Pattern # node Subscription Number 1000 nodes, 3000 clients, 3 topics each

22 Results: Subscription Pattern Client Subscriptions Mailbox Subscriptions #Registered Clients/#Registered Mailboxes Only a small number of mailboxes register even to the most popular topics => the dissemination trees are relatively small

23 Results: Single 10% Churn Event Time (minutes) Miss Rate Churn Recovery Time

24 Results: Repetitive 10% Churn Time (minutes) Miss Rate Churn Recovery Over Time

25 Results: Repetitive 100% Churn Time (minutes) Miss Rate Churn Recovery Time (Loosing All Mailboxes)

26 Summary The concept of elastic mailboxes – Self electing, self evaporating, self organizing Complementing delivery mechanisms – Spanning tree – Random walks – Out-of-band gossip through clients interaction End result: – Mailboxes dramatically reduce the scalability problem – Highly efficient, highly effective, highly robust to failures and churn

27 Open Issues Exploit subscription similarity Privacy

28 Q&A Thanks for listening…

29 Client applications poll mailboxes in order to satisfy all their subscriptions. 4 Conformity Architecture: Mailbox A rendezvous point between publications and subscriptions. Mailboxes use load sharing in order to adjust their number to match the demand for publish/subscribe service. When busy mailboxes refuse or fail to serve client applications, these applications create additional mailboxes. A publish/subscribe architecture pushes content to the mailboxes.


Download ppt "Gil EinzigerRoy Friedman Computer Science Department Technion."

Similar presentations


Ads by Google