Partitioning Social Networks for Fast Retrieval of Time-dependent Queries Mindi Yuan, David Stein, Berenice Carrasco, Joana Trindade, Yi Lu University.

Slides:



Advertisements
Similar presentations
Chapter 5: CPU Scheduling
Advertisements

Chapter 7: Deadlocks Adapted by Donghui Zhang from the original version by Silberschatz et al.
Chapter 13: The Systems Perspective of a DSS
Requirements Engineering Process
Cognitive Radio Communications and Networks: Principles and Practice By A. M. Wyglinski, M. Nekovee, Y. T. Hou (Elsevier, December 2009) 1 Chapter 12 Cross-Layer.
Effective Change Detection Using Sampling Junghoo John Cho Alexandros Ntoulas UCLA.
By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.
Challenges in Making Tomography Practical
Multihoming and Multi-path Routing
Multihoming and Multi-path Routing
MicroKernel Pattern Presented by Sahibzada Sami ud din Kashif Khurshid.
Energy-Efficient Distributed Algorithms for Ad hoc Wireless Networks Gopal Pandurangan Department of Computer Science Purdue University.
An analysis of Social Network-based Sybil defenses Bimal Viswanath § Ansley Post § Krishna Gummadi § Alan Mislove ¶ § MPI-SWS ¶ Northeastern University.
Scalable Routing In Delay Tolerant Networks
Universität Innsbruck Leopold Franzens Copyright 2006 DERI Innsbruck LarCK Workshop, ISWC/ASWC Busan, Korea 16-Feb-14 Towards Scalable.
Addition Facts
1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.
Internet Search Engine freshness by Web Server help Presented by: Barilari Alessandro.
Xia Zhou*, Stratis Ioannidis ♯, and Laurent Massoulié + * University of California, Santa Barbara ♯ Technicolor Research Lab, Palo Alto + Technicolor Research.
Correctness of Gossip-Based Membership under Message Loss Maxim GurevichIdit Keidar Technion.
Database Systems: Design, Implementation, and Management
Context-Sensitive Query Auto-Completion AUTHORS:NAAMA KRAUS AND ZIV BAR-YOSSEF DATE OF PUBLICATION:NOVEMBER 2010 SPEAKER:RISHU GUPTA 1.
Google News Personalization Scalable Online Collaborative Filtering
GeoFeed: A Location-Aware News Feed System
Peer-to-Peer and Social Networks An overview of Gnutella.
Two-Market Inter-domain Bandwidth Contracting
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Chapter 18 Methodology – Monitoring and Tuning the Operational System Transparencies © Pearson Education Limited 1995, 2005.
HyLog: A High Performance Approach to Managing Disk Layout Wenguang Wang Yanping Zhao Rick Bunt Department of Computer Science University of Saskatchewan.
Centrifuge: Integrated Lease Management and Partitioning for Cloud Services Atul Adya,John Dunagan*,Alec Wolman* Google, *Microsoft Research 1 7th USENIX.
Chapter 4 Memory Management Basic memory management Swapping
ATM Firewall Routers with Black Lists Hwajung LEE The George Washington University School of Engineering and Applied Science Electrical Engineering and.
Galit Haim, Ya'akov Gal, Sarit Kraus and Michele J. Gelfand A Cultural Sensitive Agent for Human-Computer Negotiation 1.
Page Replacement Algorithms
Cache and Virtual Memory Replacement Algorithms
A Survey of Web Cache Replacement Strategies Stefan Podlipnig, Laszlo Boszormenyl University Klagenfurt ACM Computing Surveys, December 2003 Presenter:
Taming User-Generated Content in Mobile Networks via Drop Zones Ionut Trestian Supranamaya Ranjan Aleksandar Kuzmanovic Antonio Nucci Northwestern University.
Scale Free Networks.
Routing and Congestion Problems in General Networks Presented by Jun Zou CAS 744.
Scalable and Dynamic Quorum Systems Moni Naor & Udi Wieder The Weizmann Institute of Science.
Executional Architecture
CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 14: Simulations 1.
Addition 1’s to 20.
Week 1.
University of Minnesota Optimizing MapReduce Provisioning in the Cloud Michael Cardosa, Aameek Singh†, Himabindu Pucha†, Abhishek Chandra
Tarun Bansal*, Karthik Sundaresan+,
A Non-Blocking Join Achieving Higher Early Result Rate with Statistical Guarantees Shimin Chen* Phillip B. Gibbons* Suman Nath + *Intel Labs Pittsburgh.
Database Architectures and the Web
Scalable Rule Management for Data Centers Masoud Moshref, Minlan Yu, Abhishek Sharma, Ramesh Govindan 4/3/2013.
Partitioning Social Networks for Time-dependent Queries Berenice Carrasco, Yi Lu and Joana M. F. da Trindade - University of Illinois - EuroSys11 – Workshop.
1 A Scalable Content- Addressable Network S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker Proceedings of ACM SIGCOMM ’01 Sections: 3.5 & 3.7.
University of Pennsylvania 11/21/00CSE 3801 Distributed File Systems CSE 380 Lecture Note 14 Insup Lee.
CS401 presentation1 Effective Replica Allocation in Ad Hoc Networks for Improving Data Accessibility Takahiro Hara Presented by Mingsheng Peng (Proc. IEEE.
1 Algorithms for Bandwidth Efficient Multicast Routing in Multi-channel Multi-radio Wireless Mesh Networks Hoang Lan Nguyen and Uyen Trang Nguyen Presenter:
N-Tier Architecture.
A User Experience-based Cloud Service Redeployment Mechanism KANG Yu.
GeoGrid: A scalable Location Service Network Authors: J.Zhang, G.Zhang, L.Liu Georgia Institute of Technology presented by Olga Weiss Com S 587x, Fall.
Application-Layer Anycasting By Samarat Bhattacharjee et al. Presented by Matt Miller September 30, 2002.
Capacity Scaling with Multiple Radios and Multiple Channels in Wireless Mesh Networks Oguz GOKER.
Network Aware Resource Allocation in Distributed Clouds.
The Little Engines(s) That Could: Scaling Online Social Networks Josep M. Pujol, Vijay Erramilli, Georgos Siganos, Xiaoyuan Yang Nikos Laoutaris, Parminder.
OSG Area Coordinator’s Report: Workload Management February 9 th, 2011 Maxim Potekhin BNL
THE LITTLE ENGINE(S) THAT COULD: SCALING ONLINE SOCIAL NETWORKS B 圖資三 謝宗昊.
Data Structures and Algorithms in Parallel Computing Lecture 7.
Evaluation of ad hoc routing over a channel switching MAC protocol Ethan Phelps-Goodman Lillie Kittredge.
A Connectivity-Based Popularity Prediction Approach for Social Networks Huangmao Quan, Ana Milicic, Slobodan Vucetic, and Jie Wu Department of Computer.
OSG Area Coordinator’s Report: Workload Management February 9 th, 2011 Maxim Potekhin BNL
Data Center Network Architectures
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Outline Midterm results summary Distributed file systems – continued
Presentation transcript:

Partitioning Social Networks for Fast Retrieval of Time-dependent Queries Mindi Yuan, David Stein, Berenice Carrasco, Joana Trindade, Yi Lu University of Illinois at Urbana-Champaign Presenter: Sameh Elnikety Microsoft Research questions, praise and complains to Prof. Yi Lu 1

Online Social Networking (OSN) OSNs differ from traditional web applications significantly – Handle highly personalized content Retrieved pages differ from user to user; also vary over short time intervals when new activities occur – Highly interconnected data Immediate neighborhood and extended neighborhood (friend’s friend) – Community Strong community structure among end users, albeit varying with time Scaling OSNs is a problem – Interconnected nature – Astounding growth rate Twitter grew by 10 times in one month, forced to redesign and re-implement its architecture several times 2

Existing Approaches Conventional vertical scaling – Upgrade existing hardware – Replicate fully on each machine Problems – Expensive because of the cost of high performance servers – Even infeasible Facebook requires multiple hundreds of Terabytes of memory Horizontal scaling – Use a higher number of cheap commodity servers – Partition the load among them – Good for stateless traditional web applications Problematic with data back-end layer that maintains state – OSN data cannot be partitioned into disjoint components – Leads to costly inter-server communication 3

Inter-server communication Distributed queries are shown to reduce performance compared to local queries [Pujol et. al. 2010][Curino et. al. 2010] Curino et. al. reported that local queries double throughput Question to consider: How much can we minimize inter-server communication and at what cost? 4

Overview 1.Problem 2.Previous work 3.Our approach 4.Evaluation 5

Two extremes of the spectrum Cassandra – Random partitioning with distributed hashing – Consistency is easy to maintain since only one replica is available – Inter-server traffic increases at retrieval – Slow down responses – Lead to “multi-get” hole problem with e.g. Facebook, where increasing the number of servers actually reduces throughput Replicating all users’ data on multiple servers – Eliminates inter-server traffic but increases replication overhead – Impossible to hold all data in memory on each server – Delay in writing due to updates of all replicas – Network traffic for maintaining consistency across replicas 6

Little Engines The little engine(s) that could: scaling online social networks. Pujol et. al. Sigcomm, Assumption – Most of operations are based on data of a user and her neighbors – The assumption is debatable: True for Twitter data, not true for Facebook data where a two-hop network is accessed for most operations (expanded on next slide) Guarantees that for all users in an OSN, their direct neighbors’ data are co- located on the same server – Application developers can assume local semantics – Scalability achieved with servers with low memory and network I/O Drawback – Popular data replicated on all servers – Extension to two-hop network will require a much larger number of replicas 7

Two-hop network SPAR proposed in Little Engines considers one-hop network – Mainly focus on twitter-type models: a user follows all contents of a group of senders – Facebook messages were tested upon by SPAR, but they are still modeled as a user following all his friends, instead of the real-world scenario where a user can access related activities of friends’ friends Two-hop networks – For instance, Joana can accesses all activities Between Jona and Nandana Adarsh Jona Nandana Joana Naseer 8

A Caveat A two-hop network can be transformed into a one-hop network if each message is saved with both the sender and the receiver However – Data storage immediately inflated by 2 – Popular receivers will experience a large number of writes – Replicas only worsen the situation. Popular receivers are likely to have more neighbors and hence more replicas, which entail even larger number of writes to maintain consistency 9

Overview 1.Problem 2.Previous work 3.Our approach 4.Evaluation 10

Our Approach Goal: have most requests handled locally while having much fewer replicas than the approach in “little engines”? – To have an approach scalable with two-hop networks – To reduce the cost of maintaining consistency among replicas Main idea: – Instead of partitioning the social network, which contains all users that will potentially send a message – We focus on the activity network, which contains users actually sending messages – Partitioning is based on actual activities in the network so that most requests can be served locally, which can be achieved with much fewer replicas than a strong locality guarantee 11

Activity Network 1/2 Contains much fewer edges than the social network – More amenable to partitioning and replication In a two-hop neighborhood, even if a user does not participate in much activity with his direct neighbors, there can still be abundant activity in the two-hop neighborhood Facebook data for Dec 2006 New Orleans network – While most user pairs only have 1 post, the top 8% of users receiver > 100 posts in their two-hop neighborhood 12

Activity Network 2/2 There is strong time correlation among wall posts between the same pairs of users – This motivates a local, dynamic partitioning algorithm that adapts to the temporal correlation Computation of auto-correlation – Gather all messages in the period m(1), … m(k) – Compute the auto-correlation function 13

Activity Prediction Graph The example has 6 users in the network There is a message node between each pair of user nodes Each edge is assigned a weight that reflects – Read frequency of a user (assume data collected) – The likelihood of retrieving a message from this particular edge when the most recent N messages are retrieved from the two-hop neighborhood 14

Partitioning Algorithm Periodic partitioning – Use K-Metis to repartition the graph at fixed time epochs – In between epochs, new users and new interactions are added to the graph to minimize partitioning cost Local adaptive partitioning – Takes advantage of strong time correlation between messages – Triggered when a retrieval results in remote accesses – Keep movements of nodes across partitions small 15

Overview 1.Problem 2.Previous work 3.Our approach 4.Evaluation 16

Evaluation Scenario Each retrieval involves the most recent 6 messages in a two-hop neighborhood We chose 6 as the dataset available is relatively small – A total of messages with 8640 active users We consider algorithms – Random hashing based on sender (hash_p1) – Random hashing based on sender-receiver pair (hash_p2) – Periodic partitioning – Local adaptive partitioning – Retrospective algorithm: instead use the APG, which uses past data to predict the activity for the current month, use the actual activity in the month. It is the best one can do given the limitations of the partitioning algorithm 17

Results Proportion of retrievals that accesses only one partition for all 6 messages The periodic algorithm performs much better than random hashing Close to retro, so APG does provide relatively accurate prediction 18

Results Proportion of retrievals that accesses at most 3 partitions for all 6 messages Most retrievals touch at most 3 partitions 19

Results The local adaptive algorithm further improves over periodic partitioning At least 6.4 times more local queries than random hashing Without causing large movements of nodes as in the periodic case 20

Results We also compare the periodic partitioning on the APG vs a graph using weights in one-hop neighborhood only The retrievals are performed in the two-hop neighborhood There is clearly an advantage in considering activities in the two-hop neighborhood when constructing the graph The advantage is more significant as the number of partitions increases 21

Conclusion We proposed a dynamic, adaptive algorithm to predict activities on an OSN and partition the network for fast retrieval of data The algorithm scales when information needs to be retrieved from the two-hop neighborhood Further work is done to improve data locality for most retrievals with a small number of replicas 22