Presentation is loading. Please wait.

Presentation is loading. Please wait.

Partitioning Social Networks for Time-dependent Queries Berenice Carrasco, Yi Lu and Joana M. F. da Trindade - University of Illinois - EuroSys11 – Workshop.

Similar presentations


Presentation on theme: "Partitioning Social Networks for Time-dependent Queries Berenice Carrasco, Yi Lu and Joana M. F. da Trindade - University of Illinois - EuroSys11 – Workshop."— Presentation transcript:

1 Partitioning Social Networks for Time-dependent Queries Berenice Carrasco, Yi Lu and Joana M. F. da Trindade - University of Illinois - EuroSys11 – Workshop on Social Network Systems

2 My colleague’s facebook home page!

3 Adarsh Jona Nandana Joana Naseer What is visible to Joana? – Messages in a two- hop network

4 Why is partitioning important? Different types of queries in Social Networks – photo tags, marketplace, news feed Retrieve small records (personalized content) Multiple records from different users Time-dependent – Home page refresh at Facebook Most common query

5 Existing approaches Partition based on friendship solely (1-hop network) – Power-law degree distribution Highly interconnected data Small fraction of nodes with very large degrees – General approach: Horizontal partitioning + Replication

6 Existing approaches Hash-based horizontal partitioning Adarsh Jona Nandana Joana Naseer Jona Joana Adarsh Nandana Naseer p1p2p3  Multiple records in different servers  Bad response time  Inefficient network usage  High packet overhead for such small data Key: User name

7 Existing approaches Replication  Great amount of extra storage

8 Existing approaches Query-based partitioning  Assume queries do not change with time Curino et. al., “SCHISM: A workload-driven approach to database replication and partititioning”, 2010

9 The challenge for Social Networks Friendship or query-based do not work well Underlying network varies over time – Added/deleted friends – Interaction level changes Only 30% of Facebook user pairs interact consistently from one month to the next

10 Our approach Partitioning not only the friendship network but also along the time dimension – Interaction: activity network weighted links: strong vs. weak power-law with much lighter tail – Maximal degree around 100 – This partitioning results in: Fewer cross-edges Reduced need for replication – Goal: Provide frequent users with high data locality Faster response to queries

11 Our algorithm 1. Construct an Activity Prediction Graph (APG) 2. Compute cost of local partitions 3. Partitioning APG with KMETIS 4. Greedy algorithm for partitioning the current period Differentiate between: 1) period used for prediction and 2) current period to partition Look at the interaction and predict the strength of relationship Then, look at this strength and determine what data can be accessed together Identifies links from past traces and capture relationships with strong activity Assign a cost that will determine how costly it would be to cut one edge or another

12 Our algorithm We propose a way to compute weights in this APG User nodes Message nodes Two-hop network

13 Our algorithm We propose a way to compute weights in this APG Message node weights User node weights Decay factor # msg exchanged

14 Our algorithm Cost of local partitions Message node weights User node weights Edge weights Msg accessible to user X Remote msg weights Partition 1Partition 2

15 Evaluation: Graph Partitioning Data set: – Facebook New Orleans network Jan2005 to Dec2006 8643 users and 69836 wall posts APG: Jan2005 to Nov2006 Fixed period: Dec-2006, with 13948 wall posts

16 Evaluation of Data Locality We mimic real Facebook page downloads for all wall posts in Dec2006 – Query requests 6 most recent wall posts in the user’s two-hop network We compare our algorithm to two hashed- based horizontal partitioning algorithms – Hash_p1 – Hash_p1_p2 Number of partitions used: up to 20

17 Evaluation of Data Locality Proportion of queries that access only 1 partition

18 Evaluation of Data Locality Proportion of queries that access at most 3 partitions

19 Conclusion and Future Work Our algorithm partitions social network data according to interaction levels at different times Our activity prediction graph significantly improved data locality compared to hashing Placement of data across different periods

20 Backup Slides

21 Existing approaches Hash-based horizontal partitioning Gizzard Range partitioning Cassandra Consistent hashing Dynamo Modified consistent hashing

22 Our approach Replication with time-dependency

23 Our approach Replication with time-dependency

24 Greedy Algorithm Use an algorithm for messages corresponding to the non-predicted month: Dec2006 – Initiator and receiver of the message exist in the APG but no previous interaction – Exactly one of the initiator and receiver of the message exist in the APG – Neither the initiator nor the receiver exists in the APG


Download ppt "Partitioning Social Networks for Time-dependent Queries Berenice Carrasco, Yi Lu and Joana M. F. da Trindade - University of Illinois - EuroSys11 – Workshop."

Similar presentations


Ads by Google