Presentation is loading. Please wait.

Presentation is loading. Please wait.

MIDDLEWARE SYSTEMS RESEARCH GROUP Scaling Construction of Low Fan-out Overlays for Topic-based Publish/Subscribe Systems Chen Chen 1 joint work with Roman.

Similar presentations


Presentation on theme: "MIDDLEWARE SYSTEMS RESEARCH GROUP Scaling Construction of Low Fan-out Overlays for Topic-based Publish/Subscribe Systems Chen Chen 1 joint work with Roman."— Presentation transcript:

1 MIDDLEWARE SYSTEMS RESEARCH GROUP Scaling Construction of Low Fan-out Overlays for Topic-based Publish/Subscribe Systems Chen Chen 1 joint work with Roman Vitenberg 3, Hans-Arno Jacobsen 1,2 1 Department of Electrical and Computer Engineering 2 Department of Computer Science University of Toronto 3 Department of Informatics University of Oslo ICDCS 20111

2 MIDDLEWARE SYSTEMS RESEARCH GROUP Example: pub/sub Interests: IBM Interests: Microsoft 2 ICDCS 2011

3 MIDDLEWARE SYSTEMS RESEARCH GROUP Pub/Sub A communication paradigm –Subscribers express their interests –Publishers disseminate messages Many applications and industry standards –Application integration, financial data dissemination, RSS feed distribution, business process management –WS Notifications, WS Eventing, OMGs’ Real-time Data Dissemination Service Topic-based pub/sub –TIBCO RV –Google’s GooPS ICDCS 20113

4 MIDDLEWARE SYSTEMS RESEARCH GROUP Two directions for pub/sub Design of routing protocols The design of protocols so that publications and subscriptions are sent most efficiently across the overlay network. G. Li et al., ICDCS’08 M. Castro et al., JSAC’02 Construction of overlay The construction of the overlay topology such that network traffic is minimized. Chockler et al., PODC’07 Onus et al., INFOCOM’09 ICDCS 20114

5 MIDDLEWARE SYSTEMS RESEARCH GROUP Desirable properties for overlays Low average node degree Low maximum node degree Low diameter Topic-connectivity Efficiency to construct Adaptability to churn Ease of distributed implementation ICDCS 20115 V5V5 V1V1 {b,c,d} V2V2 {a} {b,d} V4V4 {a,b} V3V3 {a,c}

6 MIDDLEWARE SYSTEMS RESEARCH GROUP Our contributions 6 Previous greedy algorithm High runtime cost Full knowledge requirement Centralized operation (difficult to decentralize) Our divide-and-conquer algorithm Low runntime cost Partial knowledge requirement Centralized operation (easy to decentralize) ICDCS 2011

7 MIDDLEWARE SYSTEMS RESEARCH GROUP Topic-connected overlay (TCO) V5 {a,c} V1 {b,c,d} V2 {a} {b,d} V4 {a,b} V3 V5 {a,c} V2 {a}{a} V4 {a,b} V1 {b,c,d} {b,d} V4 {a,b} V3 An overlay G Suboverlay Ga is topic-connected Suboverlay Gb is NOT topic-connected ICDCS 20117

8 MIDDLEWARE SYSTEMS RESEARCH GROUP MinMax-TCO V5 V1 {b,c,d} V2 {a} {b,d} V4 {a,b} V3 V 5 has 3 edges {a,c} V5 V1 {b,c,d} V2 {a} {b,d} V4 {a,b} V3 V 1 has 4 edges {a,c} ICDCS 20118

9 MIDDLEWARE SYSTEMS RESEARCH GROUP ICDCS 20119 MinMax-TCO problem and GM-M algorithm [Onus, 2009] Minimum Maximum Degree Topic-Connected Overlay (MinMax-TCO) problem –Given a set of nodes V, set of topics T, and Interest: V  T  {true, false}, construct a topic-connected overlay G with minimum maximum degree. Theorem : MinMax-TCO is NP-complete GM-M algorithm ( MinMax-ODA ) –always greedily adding an edge which 1) has the largest edge contribution, and 2) increases the maximum node degree minimally –logarithmic approximation ratio –time complexity

10 MIDDLEWARE SYSTEMS RESEARCH GROUP Why divide-and-conquer GM-M ’s runtime cost is expensive –time complexity –487 minutes: |V|=1000, |T|=100, uniform distribution* * each topic has an equal probability for all nodes that may be interested in that topic The number of nodes is the dominant factor ICDCS 201110 To improve running time Reduce the size of node set Divide-and-conquer based on node set V

11 MIDDLEWARE SYSTEMS RESEARCH GROUP Divide-and-conquer ( DC ) V 12 V0V0 {c} V6V6 {d} V9V9 {a,b,c} V3V3 {d} {a,b,c} V8V8 V 11 V2V2 {a} V5V5 {a,b,d} V 14 {b,c,d} {a,b,c} {a,b,d} V 13 V1V1 V4V4 {c} V 10 V7V7 {c} {a,c,d} {c} {a} ICDCS 2011 - Divide overlay based on V - Conquer each sub-TCO by GM-M - Combine via cross-TCO links 11

12 MIDDLEWARE SYSTEMS RESEARCH GROUP Challenges for divide Node clustering Nodes with similar interests are placed together High runtime cost Not trivial to decentralize Outputs with varying sizes Random partitioning Each node flips a coin and gets assigned to one of the partitions Fast Easy to tune Straightforward to decentralize However, May lose correlation among nodes due to randomness Maximum node degree is very sensitive to random partitioning ICDCS 201112 Divide the MinMax-TCO problem into several sub-overlay construction problems

13 MIDDLEWARE SYSTEMS RESEARCH GROUP Bad case for random partitioning ICDCS 201113 v all V a1 V b1 V a2 V b2 V b3 V b4 V1V1 V2V2 V3V3 V4V4 V5V5 V6V6 V7V7 V8V8 V1V1 V2V2 V3V3 V4V4 V5V5 V6V6 V7V7 V8V8 v all V a1 V b1 V a2 V b2 V b3 V b4 {t 1, t 2, t 3, t 4, t 5, t 6, t 7, t 8 } {t 1, t 2, t 3, t 4 }{t 5, t 6, t 7, t 8 } {t 1, t 2 } {t 3, t 4 }{t 5, t 6 } {t 7, t 8 } {t 1 } {t 2 } {t 3 }{t 4 } {t 5 } {t 6 }{t 7 }{t 8 } Random partitioning may increase the degrees of individual nodes by a factor of

14 MIDDLEWARE SYSTEMS RESEARCH GROUP Poor performance of DC for MinMax-TCO ICDCS 201114

15 MIDDLEWARE SYSTEMS RESEARCH GROUP Pub/sub workloads The number of nodes |V| : from 1000 to 8000 The number of topics |T|: from 100 to 1000 The subscription size: from 50 to 150 on average Topic popularity –Uniform: [Chockler, 2007] –Zipf: feed popularity distribution in RSS [Liu, 2005] –Exponential: stock popularity in NYSE [Tock, 2005] ICDCS 201115

16 MIDDLEWARE SYSTEMS RESEARCH GROUP Learn from workloads Observations Increased maximum node degree occurs when a node subscribes to a large number of topics “Pareto 80-20” rule: –most nodes subscribe to a relatively small number of topics –only a relatively small number of nodes might be interested in a large number of topics Basic idea special treatment for those nodes interested in many topics ICDCS 201116

17 MIDDLEWARE SYSTEMS RESEARCH GROUP Bulk nodes Given (V,T,Int) the bulk node set is a subset such that where T v is the topic set subscribed by node v and η is defined as bulk subscriber threshold The lightweight node set is L = V – B The bulk subscriber threshold η can be determined based on historical results ICDCS 201117

18 MIDDLEWARE SYSTEMS RESEARCH GROUP Challenges for combine Combine multiple sub-TCOs into one by adding cross-TCO links as bridges Not all nodes need to participate How to select node subsets for cross-TCO links? –small : increasing node degrees –large : degrading time efficiency ICDCS 201118

19 MIDDLEWARE SYSTEMS RESEARCH GROUP Representative set Given a TCO (V,T,Int,E), a representative set (rep set) is a subset of V that covers all V’s topics λ times. ICDCS 201119 V5V5 V1V1 {b,c,d} V2V2 {a} {b,d} V4V4 {a,b} V3V3 A topic-connected overlay {v 3,v 5 } is a 1-rep set which covers all topics {a,b,c,d} V5V5 V1V1 {b,c,d} V2V2 {a} {b,d} V4V4 {a,b} V3V3 V5V5 V1V1 {b,c,d} V2V2 {b,d} V4V4 {a,b} V3V3 {a,c} {v 1,v 2,v 3,v 5 } is a 2-rep set; {a,b,c,d} is covered twice. {a}{a}{a,c}

20 MIDDLEWARE SYSTEMS RESEARCH GROUP Representative nodes Representative nodes (rep-nodes) –Represents the interests of all the nodes –Can function as bridges to determine cross-TCO links –Coverage factor λ : for tuning the size of rep set Observation For typical pub/sub workload and sufficiently large partitions, minimal rep sets tend to be several times smaller than the total number of nodes. How to find a minimal rep set R λ for (V,T,Int)? –Linearly reducible to classic set cover problem: NP-complete –Greedy algorithm: always adding a node with the largest number of topics that are not yet λ-covered a logarithmic approximation ratio efficiently implemented ICDCS 201120

21 MIDDLEWARE SYSTEMS RESEARCH GROUP Divide-and-Conquer with Bulk and Lightweight Rep-nodes ( DCBR-M) V0V0 21 V3V3 V6V6 V 12 V9V9 V 15 V 18 V 19 V 20 V1V1 V4V4 V7V7 V 13 V 10 V 16 V2V2 V5V5 V8V8 V 14 V 11 V 17 {a,c,h} {b,c,d,e} {d,f,g,h} {c,e,h} {a,d,e,g} {a,c,e,f} {a,e,f,g} {a,c,d,e} {a,d,f,g} {b,d,e,f} {b,d,e,g} {a,e,f} {c,d,g,h} {b,f,h} {b,d,e} {a,c,g,h} {a,d,e} {a,c,e,g} {a,b,c,e,f,g} {a,b,c,d,f,g} {a,b,c,e,f,g,h} ICDCS 2011

22 MIDDLEWARE SYSTEMS RESEARCH GROUP Design of DCBR-M algorithm Different parameters for tuning the algorithm: –The bulk subscriber threshold ηdivide, combine bulk nodes vs. lightweight nodes –The coverage factor λ combine time efficiency vs. the quality of TCO –The number of lightweight partitions p divide, conquer p = |L| (one node one partition): combine only p = 1 (all node one partition): conquer only How to decentralize DCBR-M –Nodes autonomously organize themselves into random partitions –Different partitions construct inner edges in parallel –Different partitions compute rep sets in parallel –Bulk nodes and rep-nodes communicate and compute outer edges ICDCS 201122

23 MIDDLEWARE SYSTEMS RESEARCH GROUP Theoretical analysis of DCBR-M DCBR-M will generate a TCO whose maximum node degree is asymptotically the same as that of the TCO output by GM-M under the realistic assumption for typical pub/sub workloads. The running time of DCBR-M is Considerable speedup when |B| and |R| are small ICDCS 201123

24 MIDDLEWARE SYSTEMS RESEARCH GROUP Evaluation for DCBR-M (1) 24ICDCS 2011

25 MIDDLEWARE SYSTEMS RESEARCH GROUP Evaluation for DCBR-M (2) ICDCS 201125

26 MIDDLEWARE SYSTEMS RESEARCH GROUP Evaluation for DCBR-M (3) 26ICDCS 2011

27 MIDDLEWARE SYSTEMS RESEARCH GROUP Conclusion ICDCS 201127 Running time max degreeavg degree Required information Potential to Decentralize RingPT goodpoor: 168poor: 92full knowledgegood GM-M poor: 487 mingood: 5good: 3.88full knowledgepoor DCBR-M good: 13.6 secgood: 6good: 4.29partial knowledgegood

28 MIDDLEWARE SYSTEMS RESEARCH GROUP BACKUP ICDCS 201128

29 MIDDLEWARE SYSTEMS RESEARCH GROUP Related work Construction of the overlay –MinAvg-TCO, Chockler et al. PODC’2007 –MinMax-TCO, Onus et al. Infocom’2009 –Low-TCO, Onus et al. ICDCS’2010 –DC for MinAvg-TCO, Chen et al. ICDCS’2010 Design of routing protocols –G. Li et al. ICDCS’2008 –M. Castro et al. JASC’2002 ICDCS 201129

30 MIDDLEWARE SYSTEMS RESEARCH GROUP Minimal Number of Links A typical pub/sub system combines a number of protocols, many of which maintaining per-link state –A node must constantly monitor the availability of each of its neighbors (heartbeats and keep-alive state) –If the links are maintained using TCP, there is the cost of connection state for each link –The more links there are, the fewer topics can be routed over each individual link, thereby diminishing cross-topic aggregation benefits –If sequential-diff-based compression scheme is used, there is an extra cost associated with a history table ICDCS 2011

31 MIDDLEWARE SYSTEMS RESEARCH GROUP DCBR-M vs DC MinMax-TCO vs MinAvg-TCO Fundamentally different problems –Average node degree is a “global” property; maximum node degree possess both “global” and “local” properties. –DC for MinAvg-TCO does not directly apply to MinMax-TCO. –MinMax-TCO is more sensitive to divide, conquer and combine. –Different algorithm design, theoretical analysis, and experiments. ICDCS 201131


Download ppt "MIDDLEWARE SYSTEMS RESEARCH GROUP Scaling Construction of Low Fan-out Overlays for Topic-based Publish/Subscribe Systems Chen Chen 1 joint work with Roman."

Similar presentations


Ads by Google