Presentation is loading. Please wait.

Presentation is loading. Please wait.

PODC 2007 © 2007 IBM Corporation Constructing Scalable Overlays for Pub/Sub With Many Topics Problems, Algorithms, and Evaluation G. Chockler, R. Melamed,

Similar presentations


Presentation on theme: "PODC 2007 © 2007 IBM Corporation Constructing Scalable Overlays for Pub/Sub With Many Topics Problems, Algorithms, and Evaluation G. Chockler, R. Melamed,"— Presentation transcript:

1 PODC 2007 © 2007 IBM Corporation Constructing Scalable Overlays for Pub/Sub With Many Topics Problems, Algorithms, and Evaluation G. Chockler, R. Melamed, Y. Tock G. Chockler, R. Melamed, Y. Tock, IBM Haifa Research Lab R. Vitenberg R. Vitenberg, University of Oslo

2 © 2007 IBM Corporation Publish/Subscribe (Pub/Sub) N1 Subscription(N1)={B,C,D} N2 {A,B,C,E,} N3 {A,D} N4 {A,B,X} N5 {A,X} Message Bus Publish(M1, A) M1

3 © 2007 IBM Corporation Scalability of Pub/Sub Most traditional pub/sub systems are geared towards small scale deployment Most traditional pub/sub systems are geared towards small scale deployment –E.g., Isis MDS, TIB, MQSeries, Gryphon New generation of applications… New generation of applications… –Large data centers: Amazon, Google, Yahoo, EBay,… –RSS, feed/news readers, on-line stock trading and banking –Web 2.0, Second Life …drive dramatic growth in scale …drive dramatic growth in scale –10,000s of nodes, 1000s of topics, Internet-wide distribution Emerging systems address this trend using P2P techniques Emerging systems address this trend using P2P techniques

4 © 2007 IBM Corporation Overlay-Based Pub/Sub N1 {B,C,D} N2 {A,B,C,E} N3 {A,D} {A,B,X} N5 {A,X} N4 (M1, A) SCRIBE Corona Feedtree Sub-2-Sub TERA... Relay

5 © 2007 IBM Corporation Overlay Topologies for Pub/Sub “Good” overlay will allow for efficient and simple publication routing “Good” overlay will allow for efficient and simple publication routing –Small routing tables, low load on relays, –low latency Ideally, overlay is topic-connected: i.e., one connected component for each topic- induced sub-graph Ideally, overlay is topic-connected: i.e., one connected component for each topic- induced sub-graph –Most existing implementations construct topic- connected overlays

6 © 2007 IBM Corporation Topic-Connectivity Topics B,C,X,E are connected Topics B,C,X,E are connected Topics A and D are disconnected Topics A and D are disconnected N1 {B,C,D} N2 {A,B,C,E} N3 {A,D} {A,B,X} N5 {A,X} N4

7 © 2007 IBM Corporation Topic-Connectivity: Simple Solution N1 {B,C,D} N2 {A,B,C,E} N3 {A,D} {A,B,X} N5 {A,X} N4  Node degree grows linearly with the subscription size  Roughly twice as big as the average subscription size for rings/trees

8 © 2007 IBM Corporation Scalability of the Simple Solution Negative impact on performance due to Negative impact on performance due to –CPU load: neighbor monitoring, message processing –Connection maintenance and header overhead –Memory overhead: per-link state associated with routing and/or compression schemes being used, etc.  Scalability barrier for large systems offering a wide range of subscription choices Can we do better?

9 © 2007 IBM Corporation The Min-TCO Problem Minimum Topic-Connected Overlay (Min- TCO) problem: Minimum Topic-Connected Overlay (Min- TCO) problem: –For a set of nodes V, set of topics T, and I nterest: V  T  {true, false} –Construct a topic-connected overlay G with the minimum possible number of edges (or average degree) TCO (decision version): TCO (decision version): –Decide whether there is a topic-connected overlay consisting of k edges (for a given k )

10 © 2007 IBM Corporation Complexity of TCO Lemma: TCO(V,T,Interest,k)  NP Proof: Topic connectivity is verifyable in polynomial time Lemma: TCO(V,T,Interest,k) is NP-hard Proof : 1.Define an auxiliary problem Single Node TCO (SN-TCO) which is to decide if there is a topic-connected overlay in which the degree of single given node  d 2.Set Cover is polynomially reducible to SN-TCO 3.SN-TCO is polynomially reducible to TCO Theorem : TCO is NP-complete N5 {B,C,D} N2 {A,B} N3 {A,D} {A,C} {A,B,C,D} N4 N1

11 © 2007 IBM Corporation Approximating Min-TCO The idea: exploiting subscription overlaps The idea: exploiting subscription overlaps –Connecting the nodes with overlapping interests improves connectivity of several topics at once Greedy Merge (GM) algorithm: Greedy Merge (GM) algorithm: –Start from a singleton connected component for each (v, t)  V  T –At each iteration: add an edge that reduces the number of connected components for the biggest number of topics –Stop, once there is a single connected component for each topic

12 © 2007 IBM Corporation Greedy Merge N1 {B,C,D} N2 {A,B,C,E} N3 {A,D} {A,B,X} N5 {A,X} N4 Topic # of conn. comps A4 B3 C2 D2 X2 E1

13 © 2007 IBM Corporation Greedy Merge N1 {B,C,D} N2 {A,B,C,E} N3 {A,D} {A,B,X} N5 {A,X} N4 Topic # of conn. comps A4 B2 C1 D2 X2 E1

14 © 2007 IBM Corporation Greedy Merge N1 {B,C,D} N2 {A,B,C,E} N3 {A,D} {A,B,X} N5 {A,X} N4 Topic # of conn. comps A3 B1 C1 D2 X2 E1

15 © 2007 IBM Corporation Greedy Merge N1 {B,C,D} N2 {A,B,C,E} N3 {A,D} {A,B,X} N5 {A,X} N4 Topic # of conn. comps A2 B1 C1 D2 X1 E1

16 © 2007 IBM Corporation Greedy Merge N1 {B,C,D} N2 {A,B,C,E} N3 {A,D} {A,B,X} N5 {A,X} N4 Topic # of conn. comps A2 B1 C1 D1 X1 E1

17 © 2007 IBM Corporation Greedy Merge N1 {B,C,D} N2 {A,B,C,E} N3 {A,D} {A,B,X} N5 {A,X} N4 Topic # of conn. comps A1 B1 C1 D1 X1 E1  Average degree of 2 vs. almost 3 for ring-per-topic!

18 © 2007 IBM Corporation GM Running Time O(|V| 4  |T|) O(|V| 4  |T|) –At most |V| 2 iterations –At most |V| 2 edges inspected at each iteration –At most |T| steps to inspect an edge Can be optimized to run in O(|V| 2  |T|) Can be optimized to run in O(|V| 2  |T|) –For each e  V  V, weight(e) = the number of connected components merged by e –At each iteration, output the heaviest edge and adjust the other edge weights accordingly –Stop once there are no more edges with weight > 0

19 © 2007 IBM Corporation Approximability Results Lemma: 1.The number of edges in the overlay constructed by GM  log(|V|  |T|) OPT Proof: Similar to that of the approximation ratio of the greedy algorithm for Set Cover 2.There exists an input on which GM’s output meets this ratio Theorem: No algorithm can approximate Min-TCO within a constant factor (unless P=NP) Proof: Existence of such an algorithm would imply existence of the constant factor approximation for Set Cover which is known to be impossible (unless P=NP)

20 © 2007 IBM Corporation Practical Benefits

21 © 2007 IBM Corporation More Overlay Design Problems Filtering: Given an upper bound d on the node degree, minimize the number of relays used to connect each topic Filtering: Given an upper bound d on the node degree, minimize the number of relays used to connect each topic –Captures the cases when full topic-connectivity is infeasible because of resource constraints Diameter: Given an upper bound d on the node degree, minimize the diameter of each topic in the overlay Diameter: Given an upper bound d on the node degree, minimize the diameter of each topic in the overlay –Latency optimal routing under resource constraints …

22 © 2007 IBM Corporation Conclusions Initiated formal study of the problem of designing efficient and scalable overlay topologies for pub/sub Initiated formal study of the problem of designing efficient and scalable overlay topologies for pub/sub Defined a representative problem (Min-TCO) capturing the cost of constructing topic- connected overlays Defined a representative problem (Min-TCO) capturing the cost of constructing topic- connected overlays –NP-Completeness, polynomial approximation, inapproximability results Empirical evaluation showed effectiveness of our approximation algorithm on practical inputs Empirical evaluation showed effectiveness of our approximation algorithm on practical inputs

23 © 2007 IBM Corporation Future Directions Study dynamic case Study dynamic case Investigate other overlay design problems Investigate other overlay design problems Study distributed case Study distributed case –Partial knowledge of other node interest –Dynamically changing interest assignments

24 © 2007 IBM Corporation Thank You!


Download ppt "PODC 2007 © 2007 IBM Corporation Constructing Scalable Overlays for Pub/Sub With Many Topics Problems, Algorithms, and Evaluation G. Chockler, R. Melamed,"

Similar presentations


Ads by Google