MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG Distributed Ranked Data Dissemination in Social Networks Joint work with: Mo Sadoghi Vinod Muthusamy Hans-Arno.

Slides:



Advertisements
Similar presentations
Cristian Lumezanu Neil Spring Bobby Bhattacharjee Decentralized Message Ordering for Publish/Subscribe Systems.
Advertisements

Supporting Cooperative Caching in Disruption Tolerant Networks
Efficient Event-based Resource Discovery Wei Yan*, Songlin Hu*, Vinod Muthusamy +, Hans-Arno Jacobsen +, Li Zha* * Chinese Academy of Sciences, Beijing.
Alex Cheung and Hans-Arno Jacobsen August, 14 th 2009 MIDDLEWARE SYSTEMS RESEARCH GROUP.
MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG Grand Challenge: The BlueBay Soccer Monitoring Engine Hans-Arno Jacobsen Kianoosh Mokhtarian Tilmann Rabl Mohammad.
Cognitive Publish/Subscribe for Heterogeneous Clouds Šarūnas Girdzijauskas, Swedish Institute of Computer Science (SICS) Joint work with:
Small-Scale Peer-to-Peer Publish/Subscribe
Transactional Mobility in Distributed Content-Based Publish/Subscribe Systems Songlin Hu*, Vinod Muthusamy +, Guoli Li +, Hans-Arno Jacobsen + * Chinese.
Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra, and Nalini Venkatasubramanian.
©NEC Laboratories America 1 Hui Zhang Samrat Ganguly Sudeept Bhatnagar Rauf Izmailov NEC Labs America Abhishek Sharma University of Southern California.
M ERCURY : A Scalable Publish-Subscribe System for Internet Games Ashwin R. Bharambe, Sanjay Rao & Srinivasan Seshan Carnegie Mellon University.
Carnegie Mellon University Complex queries in distributed publish- subscribe systems Ashwin R. Bharambe, Justin Weisz and Srinivasan Seshan.
Design and Evaluation of a Wide-Area Event Notification Service Antonio Carzaniga David S. Rosenblum Alexander L. Wolf.
1 AINA 2006 Wien, April th 2006 DiVES: A DISTRIBUTED SUPPORT FOR NETWORKED VIRTUAL ENVIRONMENTS The IEEE 20th International Conference on Advanced.
ICNP'061 Benefit-based Data Caching in Ad Hoc Networks Bin Tang, Himanshu Gupta and Samir Das Department of Computer Science Stony Brook University.
Matching Data Dissemination Algorithms to Application Requirements John Heidermann, Fabio Silva, Deborah Estrin Presented by Cuong Le (CPSC538A)
Handout # 4: Scaling Controllers in SDN - HyperFlow
Fuego Event Service: Towards Modularity in Event Routing Sasu Tarkoma Rutgers-Helsinki Workshop
Background Notification services in LAN Provides Notification Selection Notification Delivery Done on a centralized server (hence not scalable) Challenge.
Distributed Publish/Subscribe Network Presented by: Yu-Ling Chang.
Christos Tryfonopoulos MPII Saarbrücken David Midgley INSEAD Fontainebleau Pricing Information Goods in an Agent-based Information Filtering System Laura.
Achieving fast (approximate) event matching in large-scale content- based publish/subscribe networks Yaxiong Zhao and Jie Wu The speaker will be graduating.
Alex King Yeung Cheung and Hans-Arno Jacobsen University of Toronto June, 24 th 2010 ICDCS 2010 MIDDLEWARE SYSTEMS RESEARCH GROUP.
Effects of Routing Computations in Content-Based Routing Networks with Mobile Data Sources Vinod Muthusamy, Milenko Petrovic, Hans-Arno Jacobsen University.
MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG MADES - A Multi-Layered, Adaptive, Distributed Event Store Tilmann Rabl Mohammad Sadoghi Kaiwen Zhang Hans-Arno.
Department of Computer Science Provenance-based Trustworthiness Assessment in Sensor Networks Elisa Bertino CERIAS and Department of Computer Science,
Publisher Mobility in Distributed Publish/Subscribe Systems Vinod Muthusamy, Milenko Petrovic, Dapeng Gao, Hans-Arno Jacobsen University of Toronto June.
MIDDLEWARE SYSTEMS RESEARCH GROUP Denial of Service in Content-based Publish/Subscribe Systems M.A.Sc. Candidate: Alex Wun Thesis Supervisor: Hans-Arno.
Sven Bittner, 12 April 2007 Talk at the 5th New Zealand Computer Science Research Student Conference NEWS ALERT: (Kiwi or Cow) and Chainsaw = (Kiwi and.
Gil EinzigerRoy Friedman Computer Science Department Technion.
Navneet Kumar Pandey 1 Stéphane Weiss 1 Roman Vitenberg 1 Kaiwen Zhang 2 Hans-Arno Jacobsen 2 2 University of Toronto 1 University of Oslo Minimizing the.
Supporting Disconnected Operations in Publish/Subscribe Systems Vinod Muthusamy Joint work with Milenko Petrovic, Ioana Burcea, H.-Arno Jacobsen, Eyal.
Content-Based Routing in Mobile Ad Hoc Networks Milenko Petrovic, Vinod Muthusamy, Hans-Arno Jacobsen University of Toronto July 18, 2005 MobiQuitous 2005.
The Effect of Collection Organization and Query Locality on IR Performance 2003/07/28 Park,
MIDDLEWARE SYSTEMS RESEARCH GROUP Middleware A Policy Management Framework for Content-based Publish/Subscribe Middleware Hans-Arno Jacobsen Department.
DISTRIBUTED EVENT AGGREGATION FOR CONTENT-BASED PUBLISH/SUBSCRIBE SYSTEMS Navneet Kumar Pandey 1 Stéphane Weiss 1 Roman Vitenberg 1 Kaiwen Zhang 2 Hans-Arno.
MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG Total Order in Content-based Publish/Subscribe Systems Joint work with: Vinod Muthusamy, Hans-Arno Jacobsen.
Distributed Automatic Service Composition in Large-Scale Systems Songlin Hu*, Vinod Muthusamy +, Guoli Li +, Hans-Arno Jacobsen + * Chinese Academy of.
Classification and Analysis of Distributed Event Filtering Algorithms Sven Bittner Dr. Annika Hinze University of Waikato New Zealand Presentation at CoopIS.
PhD Candidate: Alex K. Y. Cheung Supervisor: Hans-Arno Jacobsen PhD Thesis Presentation University of Toronto March 28, 2011 MIDDLEWARE SYSTEMS RESEARCH.
MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG SDN-Like: A Network-as-a-Service Publish/Subscribe Model Collaborators: Reza Sherafat Young Yoon Hans-Arno Jacobsen.
MIDDLEWARE SYSTEMS RESEARCH GROUP Modelling Performance Optimizations for Content-based Publish/Subscribe Alex Wun and Hans-Arno Jacobsen Department of.
Parallel Event Processing for Content-Based Publish/Subscribe Systems Amer Farroukh Department of Electrical and Computer Engineering University of Toronto.
MIDDLEWARE SYSTEMS RESEARCH GROUP Adaptive Content-based Routing In General Overlay Topologies Guoli Li, Vinod Muthusamy Hans-Arno Jacobsen Middleware.
Minimal Broker Overlay Design for Content-Based Publish/Subscribe Systems Naweed Tajuddin Balasubramaneyam Maniymaran Hans-Arno Jacobsen University of.
ICDCS Beijing China Routing of XML and XPath Queries in Data Dissemination Networks Guoli Li, Shuang Hou Hans-Arno Jacobsen Middleware Systems Research.
VLDB2005 CMS-ToPSS: Efficient Dissemination of RSS Documents Milenko Petrovic Haifeng Liu Hans-Arno Jacobsen University of Toronto.
András Belokosztolszki, David M Eyers, Peter R Pietzuch, Jean Bacon and Ken Moody Role-Based Access Control for Publish/Subscribe.
Information-Centric Networks10b-1 Week 10 / Paper 2 Hermes: a distributed event-based middleware architecture –P.R. Pietzuch, J.M. Bacon –ICDCS 2002 Workshops.
Information-Centric Networks Section # 10.2: Publish/Subscribe Instructor: George Xylomenos Department: Informatics.
Peter R Pietzuch and Jean Bacon Peer-to-Peer Overlay Networks in an Event-Based Middleware DEBS’03, San Diego, CA, USA,
1 State-of-the-art in Publish/Subscribe Middleware for Supporting Mobility Sumant Tambe EECS Preliminary Examination December 11, 2007 Vanderbilt University,
Distributed Automatic Service Composition in Large-Scale Systems Songlin Hu*, Vinod Muthusamy +, Guoli Li +, Hans-Arno Jacobsen + * Chinese Academy of.
Stefanos Antaris Distributed Publish/Subscribe Notification System for Online Social Networks Stefanos Antaris *, Sarunas Girdzijauskas † George Pallis.
MIDDLEWARE SYSTEMS RESEARCH GROUP Divide and Conquer Algorithms for Pub/Sub Overlay Design Chen Chen 1 joint work with Hans-Arno Jacobsen 1,2, Roman Vitenberg.
Congestion Avoidance with Incremental Filter Aggregation in Content-Based Routing Networks Mingwen Chen 1, Songlin Hu 1, Vinod Muthusamy 2, Hans-Arno Jacobsen.
AMSA TO 4 Advanced Technology for Sensor Clouds 09 May 2012 Anabas Inc. Indiana University.
Click to edit Present’s Name AP-Tree: Efficiently Support Continuous Spatial-Keyword Queries Over Stream Xiang Wang 1*, Ying Zhang 2, Wenjie Zhang 1, Xuemin.
Bump hunting In The Dark: Local Discrepancy Maximization on Graphs
Yiting Xia, T. S. Eugene Ng Rice University
Navneet Kumar Pandey1 Stéphane Weiss1 Roman Vitenberg1
Project Demo Mehdi Sadri Jamshid Esmaelnezhad Spring 2012
StreamApprox Approximate Stream Analytics in Apache Flink
Distributed Publish/Subscribe Network
Pramod Bhatotia, Ruichuan Chen, Myungjin Lee
Foundations for Highly-Available Content-based Publish/Subscribe Overlays Young Yoon, Vinod Muthusamy and Hans-Arno Jacobsen.
Indirect Communication Paradigms (or Messaging Methods)
Indirect Communication Paradigms (or Messaging Methods)
Relax and Adapt: Computing Top-k Matches to XPath Queries
Presentation transcript:

MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG Distributed Ranked Data Dissemination in Social Networks Joint work with: Mo Sadoghi Vinod Muthusamy Hans-Arno Jacobsen University of Toronto ICDCS, July 9-11 th 2013 Kaiwen Zhang

MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG 2 Top-k & publish/subscribe for social networks 2 broker match & forward Advertisement path Subscription path Publication path publisher name = `John Doe’ location = `New York’ subscriber name = `John Doe’ subscriber location = `America’ k = 1, W = 2 Closest to Philadelphia name = ‘John Doe’ location = `USA’ name = `John Doe’ location = `Philadelphia’ location = `America’ k = 1, W = 2 Closest to Philadelphia

MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG 3 Use cases Events-heavy applications require top-k  Social networks News feeds homepage  Location-based applications Online games Efficient support for top-k in pub/sub  Top-k publications for a subscription  Mixed subscriptions (top-k and regular)  Topology is provided

MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG 4 Outline Top-k model for publish/subscribe Related work Current late vs. naive early approach Proposed window chunking solution Evaluation

MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG 5 Top-k processing Regular broker operation Count-based window parameters supplied by subscription: k is # publications W is window size δ is shift size Each publication is scored and the top-k are extracted

MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG 6 Related work Top-k computation for pub/sub  Defining scoring functions  Data structures for storing top-k results  Approximate solutions based on histograms Top-k processing in database  Reverse problem: find existing data for a query No work on top-k dissemination in pub/sub  Top-k computation occur within a single broker  Collect the entire stream at the edge

MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG 7 Current late approach Maintains top-k processing & converts into regular subscription Submit a top-k subscription {k = 2, w = 4, δ = 1} Rest of the topology is agnostic to the top-k semantics: they simply forward matching events The entire matching stream is collected at the edge and processed to determine the top-k. This approach is not efficient! Low scoring publications are propagated to the edge and then filtered out. [1,2,a,b][2,a,b,3][a,b,3,c][b,3,c,4][3,c,4,d] Can we reduce intra-network traffic by pushing the top-k computation upstream? (1,2,3,4) (a,b,c,d) (1,2,a,b,3,c,4,d) => [1,2,a,b][2,a,b,3][a,b,3,c][b,3,c,4][3,c,4,d]

MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG 8 Naive early approach Fill windows at publisher edge & compute top-k publications Only disseminate top-k publications Merge top-k streams to obtain final results (1,2,b,c) [1,2,3,4] [a,b,c,d] (1,2)(1,2) (b,c)(b,c)

MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG 9 Correctness criterion Goal - same result as the late approach  No false positives or negatives Stream reconstruction criterion  A stream of top-k publications is correct if its reconstructed stream of all publications, possible according to the ordering semantics, can be processed centrally to obtain the same result. Ordering guarantees?  Consider per-publisher FIFO ordering  Multiple interleavings of publications possible

MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG 10 Naive counter-example k = 2, W = 4, δ = 1 Forward local top-k results Publications are delivered according to per-source ordering Reconstructing the stream: we fail to consider windows such as [b,c,d,1] which are “overlapping” Fill windows at the publisher edge brokers

MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG 11 Overlapping windows problem Key idea: Send a few more publications  Enough to prevent false negatives  Less than all publications to be efficient Key insight: Computing overlapping windows  Publishers compute windows for own publications  Windows which contain publications from different source brokers can only be computed downstream  Need full knowledge of publications in such windows

MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG 12 Window chunking technique Hybrid solution  Send all publications for overlapping windows  Reduce the occurrence of overlapping windows  Early top-k filtering of local source windows  Late top-k filtering of overlapping windows Each chunk contains publications from a single source broker Chunks contain a stream of top-k publications for successive windows Left and right guards are full windows of publications that start and end a chunk The subscriber edge broker must fully process one chunk......before choosing another chunk to process Overlapping windows can only occur in the intrachunk region, for which we have guard windows which can be processed downstream

MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG 13 Evaluation summary Setup:  PADRES implementation  SciNet: cluster of ~1000 cores Main metric: throughput reduction  Normalized to late approach Sensitivity analysis  Top-k semantics, workload, etc. Performance analysis  Traces from Twitter and Facebook

MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG 14 Timing sensitivity Does not scale when mixed

MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG 15 Offset chunks Publication is filtered for S1 but part of the guard of S2: must be forwarded A publication can only be filtered if it is not part of any guards Future work: solve the issue by synchronizing chunks adaptively

MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG 16 Social workload properties Use of popularity as scoring function

MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG 17 Social workload properties Offset chunks are present: windows are filled at different times Top-k “cuts” the long tail of Facebook popularity function: unpopular publications are filtered Twitter has a wider tail: a wider variety of publications are found in top-k's

MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG 18 Conclusions Top-k support for publish/subscribe  For event-heavy applications (social networks) Efficient top-k distribution  Reduce intra-network traffic  Maintain correctness Proposed hybrid chunking solution  Early top-k computation of local windows  Late top-k computation of overlapping windows Evaluation observations  Need for chunk synchronization  Topic popularity in social networks beneficial

MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG 19 Thank you! Questions? padres.msrg.org

MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG 20 Scoring function sensitivity Same top-k for every subscription: maximize pruning Uniform distribution: does not scale as all publications are selected by at least one subscriber Zipfian distribution: traffic reduction even at 1000 subscriptions

MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG 21 Impact of deduplication Non-buffering solution even worse! Deduplication is essential Constant traffic reduction (Best-case scalability)

MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG 22 Latency comparison Similar latency: Computation overhead is not considered