Masoud Valafar †, Reza Rejaie †, Walter Willinger ‡ † University of Oregon ‡ AT&T Labs-Research WOSN’09 Barcelona, Spain Beyond Friendship Graphs: A Study.

Slides:



Advertisements
Similar presentations
Defending against large-scale crawls in online social networks Mainack Mondal Bimal Viswanath Allen Clement Peter Druschel Krishna Gummadi Alan Mislove.
Advertisements

Mojtaba Torkjazi, Reza Rejaie, Walter Willinger University of Oregon AT&T Labs-Research WOSN09 Barcelona, Spain.
Measurement and Analysis of Online Social Networks 1 A. Mislove, M. Marcon, K Gummadi, P. Druschel, B. Bhattacharjee Presentation by Shahan Khatchadourian.
Mobile Communication Networks Vahid Mirjalili Department of Mechanical Engineering Department of Biochemistry & Molecular Biology.
Network Matrix and Graph. Network Size Network size – a number of actors (nodes) in a network, usually denoted as k or n Size is critical for the structure.
Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.
Finding your friends and following them to where you are by Adam Sadilek, Henry Kautz, Jeffrey P. Bigham Presented by Guang Ling 1.
Stelios Lelis UAegean, FME: Special Lecture Social Media & Social Networks (SM&SN)
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα Strong and Weak Ties Chapter 3, from D. Easley and J. Kleinberg book.
Practical Recommendations on Crawling Online Social Networks
Walter Willinger AT&T Research Labs Reza Rejaie, Mojtaba Torkjazi, Masoud Valafar University of Oregon Mauro Maggioni Duke University HotMetrics’09, Seattle.
Assessing the Effects of a Soft Cut-off in the Twitter Social Network Niloy Ganguly, Saptarshi Ghosh.
Enabling the Social Web Krishna P. Gummadi Networked Systems Group Max Planck Institute for Software Systems.
Topology Generation Suat Mercan. 2 Outline Motivation Topology Characterization Levels of Topology Modeling Techniques Types of Topology Generators.
TDTS21: Advanced Networking Lecture 8: Online Social Networks Based on slides from P. Gill Revised 2015 by N. Carlsson.
Link creation and profile alignment in the aNobii social network Luca Maria Aiello et al. Social Computing Feb 2014 Hyewon Lim.
Flickr Information propagation in the Flickr social network Meeyoung Cha Max Planck Institute for Software Systems With Alan Mislove.
MassConf: Automatic Configuration Tuning By Leveraging User Community Information Computer Science Wei Zheng, Ricardo Bianchini, Thu Nguyen Rutgers University.
Flickr Tags Network Mustafa Kilavuz. Tags A tag is a keyword Search, spam detection, reputation systems, personal organization and metadata.
UNDERSTANDING VISIBLE AND LATENT INTERACTIONS IN ONLINE SOCIAL NETWORK Presented by: Nisha Ranga Under guidance of : Prof. Augustin Chaintreau.
Application Identification in information-poor environments Charalampos Rotsos 02/02/20101 What is application identification Current status My work Future.
CS 728 Lecture 4 It’s a Small World on the Web. Small World Networks It is a ‘small world’ after all –Billions of people on Earth, yet every pair separated.
 We developed a fast and tunable crawler, Cruiser.  Cruiser uses a master-slave architecture, parallel crawling, and leverages the two-tier topology.
Measurement and Analysis of Online Social Networks By Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, Bobby Bhattacharjee Attacked.
A Measurement -driven Analysis of Information Propagation in the Flickr Social Network Offense April 29.
1 Characterizing Files in the Modern Gnutella Network: A Measurement Study Shanyu Zhao, Daniel Stutzbach, Reza Rejaie University of Oregon SPIE Multimedia.
Measurement and Analysis of Online Social Networks Alan Mislove,Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, Bobby Bhattacharjee Presented.
Minas Gjoka, UC IrvineWalking in Facebook 1 Walking in Facebook: A Case Study of Unbiased Sampling of OSNs Minas Gjoka, Maciej Kurant ‡, Carter Butts,
Web Archive Information Retrieval Miguel Costa, Daniel Gomes (speaker) Portuguese Web Archive.
Are consumers really networked? And, if they are, should you care? Jim Jansen Senior Fellow Pew Internet & American Life Project (they are and you should)
A Measurement-driven Analysis of Information Propagation in the Flickr Social Network WWW09 报告人: 徐波.
CSE390 Advanced Computer Networks Lecture 15: Online Social Networks (The network is people) Based on slides by A. Mislove, F. Schneider, and W. Willinger.
Multigraph Sampling of Online Social Networks Minas Gjoka, Carter Butts, Maciej Kurant, Athina Markopoulou 1Multigraph sampling.
. Social Media for Social Good. Where Are We Now? Does Your Club Have a Website? Does Your Club Have a Facebook Page? Do you have a personal Facebook.
1 Speaker : 童耀民 MA1G Authors: Ze Li Dept. of Electr. & Comput. Eng., Clemson Univ., Clemson, SC, USA Haiying Shen ; Hailang Wang ; Guoxin.
Authors: Xu Cheng, Haitao Li, Jiangchuan Liu School of Computing Science, Simon Fraser University, British Columbia, Canada. Speaker : 童耀民 MA1G0222.
Exploring the dynamics of social networks Aleksandar Tomašević University of Novi Sad, Faculty of Philosophy, Department of Sociology
Modeling Relationship Strength in Online Social Networks Rongjing Xiang: Purdue University Jennifer Neville: Purdue University Monica Rogati: LinkedIn.
Suggesting Friends using the Implicit Social Graph Maayan Roth et al. (Google, Inc., Israel R&D Center) KDD’10 Hyewon Lim 1 Oct 2014.
Using Transactional Information to Predict Link Strength in Online Social Networks Indika Kahanda and Jennifer Neville Purdue University.
Data Analysis in YouTube. Introduction Social network + a video sharing media – Potential environment to propagate an influence. Friendship network and.
Accessing the Deep Web Bin He IBM Almaden Research Center in San Jose, CA Mitesh Patel Microsoft Corporation Zhen Zhang computer science at the University.
WALKING IN FACEBOOK: A CASE STUDY OF UNBIASED SAMPLING OF OSNS junction.
Understanding Cross-site Linking in Online Social Networks Yang Chen 1, Chenfan Zhuang 2, Qiang Cao 1, Pan Hui 3 1 Duke University 2 Tsinghua University.
Poking Facebook: Characterization of OSN Applications Minas Gjoka, Michael Sirivianos, Athina Markopoulou, Xiaowei Yang University of California, Irvine.
Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst.
Tie Strength, Embeddedness & Social Influence: Evidence from a Large Scale Networked Experiment Sinan Aral, Dylan Walker Presented by: Mengqi Qiu(Mendy)
Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University
How far removed are you? Scalable Privacy-Preserving Estimation of Social Path Length with Social PaL Marcin Nagy joint work with Thanh Bui, Emiliano De.
A Data-Reachability Model for Elucidating Privacy and Security Risks Related to the Use of Online Social Networks S. Creese, M. Goldsmith, J. Nurse, E.
A measurement-driven Analysis of Information Propagation in the Flickr Social Network Meeyoung Cha Alan Mislove Krisnna P. Gummadi.
Sampling Techniques for Large, Dynamic Graphs Daniel Stutzbach – University of Oregon Reza Rejaie – University of Oregon Nick Duffield – AT&T Labs—Research.
Similarity & Recommendation Arjen P. de Vries CWI Scientific Meeting September 27th 2013.
Retain H o Refute hypothesis and model MODELS Explanations or Theories OBSERVATIONS Pattern in Space or Time HYPOTHESIS Predictions based on model NULL.
Tagging Systems and Their Effect on Resource Popularity Austin Wester.
Stefanos Antaris A Socio-Aware Decentralized Topology Construction Protocol Stefanos Antaris *, Despina Stasi *, Mikael Högqvist † George Pallis *, Marios.
We.b : The web of short URLs Demetris Antoniades, lasonas Polakis, Gerogios Kontaxis, Elias Athansapoulos, Sotiris loannidis, Evangelos P.Markatos, Thomas.
A Latent Social Approach to YouTube Popularity Prediction Amandianeze Nwana Prof. Salman Avestimehr Prof. Tsuhan Chen.
Scalable Data Scale #2 site on the Internet (time on site) >200 billion monthly page views Over 1 million developers in 180 countries.
Stefanos Antaris Distributed Publish/Subscribe Notification System for Online Social Networks Stefanos Antaris *, Sarunas Girdzijauskas † George Pallis.
Review CS File Systems - Partitions What is a hard disk partition?
CHARACTERIZING CLOUD COMPUTING HARDWARE RELIABILITY Authors: Kashi Venkatesh Vishwanath ; Nachiappan Nagappan Presented By: Vibhuti Dhiman.
1 Patterns of Cascading Behavior in Large Blog Graphs Jure Leskoves, Mary McGlohon, Christos Faloutsos, Natalie Glance, Matthew Hurst SDM 2007 Date:2008/8/21.
Measuring User Influence in Twitter: The Million Follower Fallacy Meeyoung Cha Hamed Haddadi Fabricio Benevenuto Krishna P. Gummadi.
CREATED BY : ARCHANA L. TULSANI.  What is a Social networking site ?  Different Social networking sites(SNS)  Uses of SNS  Reasons for increasing.
Stochastic Models of User-Contributory Web Sites
Summary Presented by : Aishwarya Deep Shukla
Slow Social Network Intelligence Systems
The likelihood of linking to a popular website is higher
SAS/Graph to help data Dose/Concentration consistency review
Presentation transcript:

Masoud Valafar †, Reza Rejaie †, Walter Willinger ‡ † University of Oregon ‡ AT&T Labs-Research WOSN’09 Barcelona, Spain Beyond Friendship Graphs: A Study of User Interactions in Flickr

 What does an inferred friendship graph really say about the Online Social Network (OSN) in question?  Represents a static, incomplete, inaccurate snapshot of the system  Aggregates information over some time period  What is the active portion of an OSNs inferred friendship graph  Requires a notion of “user interaction” and/or of “active user”  Inherently dynamic  Challenges when moving from inferred friendship to inferred interaction graphs  Little (no) incentives for OSNs to make user activity data available  Information on user interactions is in general hard to obtain Motivation

This Study  Main focus is on characterizing user interactions in Flickr  (Indirect) fan-owner interactions through photos shared among users  Based on representative snapshots of fan-owner interactions  More specifically, we focus on  Extent of user interactions  Locality (and reciprocation) of interaction  Relationship between user interaction & user friendship  Temporal patterns of interactions  Related studies  Chun et al.’08  Viswanath et al.’09 – WOSN’09

Friend list: User_id 1 User_id 2 … Profile: Name User id Number of photos Favorite Photos list: Photo_id 1 Photo_id 2 … Photo list Profile: Title Post date Fan list: User_id 1, time … Favorite Photos list: Alice Bob Bob, time Alice photo id User Interactions in Flickr

 Users interactions/relations are indirect  Through photos  Users as owners  Photo list (photos they post)  “Favored photos” (photos they post with at least 1 fan)  Users as fans  Photos they declare as their “favorites”  Favorite photo list FansOwners Photos

Data Collection in Flickr  Flickr-specific issues  Provides well-documentes API  Imposes a rate limit for querying the server of 10 queries/second  Has well-known user ID format (e.g.,  Data collection method 1 (crawling owned photo lists)  Query server for IDs of all photos owned by a user  Separate query to server for each photo to obtain IDs of all its fans plus associated timing info  Obtain fan-owner interactions from the owner side  Data collection method 2 (crawling favorite photo lists)  Query server for IDs of all favorite photos of a user along with the IDs of their associated owners with no timing info  Obtain fan-owner interactions from the fan side

 Dataset І (Interactions of random users)  Leveraged known user ID format  Identified about 122K random users  Extracted user-specific information  Profile, friend list  Favorite photo list  Photo list, photo profiles (timing info)  Photo fan lists (timing info)  Number of queries needed is on the order of number of photos (slow and inefficient)  Dataset I provides a (relatively small) representative sample of detailed fan-owner interactions in Flickr (with timing info) Data Collection: Method 1

 Dataset II (Interactions of users in main component of friendship graph)  Used 122K sampled users as seeds  Crawled their friendship graph via their friend lists  Identified main component (MC) of the friendship graph  Collect list of favorite photos and their owners for all MC users and any new user we encounter as an owner of a favorite photo  Miss negligible fraction of interactions with singleton users/fans or unreachable fans within MC  Number of queries needed is on the order of number of users (efficient and fast)  Dataset II provides a large snapshot of indirect fan-owner interactions within MC without any timing info Data Collection: Method 2

# photos#favored#favorite#users#fans#owners Singletons835,9703,73424,078101,2102,6381,230 MC users2,646,139142,391532,33321,1274,0535,075 # favorite photos # users# fans# owners Interaction in MC 31,495,8694,140,007821,8511,044,055  Dataset I: small, yet detailed  Most of the randomly selected users are inactive singletons  MC users are more active than singleton users  Dataset II: large, but less detailed  Estimate of total user population in Flickr  Dataset I: 1 out of 6 of our randomly selected users are in MC  Dataset II: Est. total Flickr population = 6*4.14M = 25M (as of mid-08)

 Extent of overall fan-owner interactions  More than 95% of fan-owner interactions occur among users in the MC of the Flickr friendship graph  Extent of fan-owner interactions in MC  The most active users in Flickr form a core in the interaction graph and are responsible for the vast majority of fan-owner interactions  Temporal properties of fan-owner interactions  There exists no strong correlation between age and popularity of a photo  The majority of fans of a photo arrives during the first week after the photo is posted  Note: The results are typically based on Dataset I and are validated (where possible) using Dataset II

Extent of Fan-Owner Interactions (I)  Posted photos  Only about 20% of singleton users post 1 or more photos  About 50% of MC users post 1 or more photos  “Active” photos (at least 1 fan)  More than 99% of photos owned by singleton users have no fans  About 95% of photos owned by MC users have no fans

Extent of Fan-Owner Interactions (II)  Users in their roles as owners or fans of photos  “Active” as an owner  At least one posted photo with a fan  More the 97% of fan-owner interactions are associated with active MC owners  “Active” as a fan  At least 1 favorite photo owned by another user  More than 95% of fan-owner interactions are associated with active MC fans  Vast majority (>95%) of interactions in Flickr are among active users in the MC of the friendship graph

Extent of Fan-Owner Interactions in MC(I)  More detailed view of active users  Order owners by indegree  Order fans by outdegree  Order photos by indegree  Top 10% of fans are responsible for 80% of interactions  Top 10% of owners are responsible for 90% of interactions  Top 10% of photos are responsible for only about 50% of interactions  The top 10% fans/owners are responsible for most interactions

 On the overlap between top active fans and top active owners?  E.g., 30% of the top 1K fans are among the top 1K owners  Percentage of overlap reaches max of around 57% for top 200K fans  On the correlation between the level of activity of a user as a fan and as a owner?  The most active fans are more likely to be among the most active owners, and conversely. Extend of Fan-Owner Interactions in MC (II)  The top active users form a core of the Flickr interaction graph

Temporal Properties of Interactions (I)  Age of a photo vs. popularity  Range of popularity widens with age  Distribution of photo age does not the photo’s popularity  The distribution of the popularity of a photo does not depend on its age  Explanation?

Temporal Properties of Interactions (II)  In terms of fan arrival rate of photos, what matters is not the age of the photo …  Age of the photo does not have much effect on the distribution of fan arrival rate  … but when during the photo’s lifetime the fans arrived  Fan arrival rate in the first week is an order of magnitude larger than during other periods  Most photos receive most of their fans during the first week after their posting

Conclusions  Discussed 2 measurement methodologies for collecting fan-owner interactions in the Flickr OSN  Presented initial study of fan-owner interaction in Flickr  Most of the users are inactive (as defined in this work)  More than 95% of interactions occur in MC of the friendship graph  Top 10% of owners (fans) in MC cause 90% (80%) of all interactions  There is significant overlap between the top owners and top fans and these users form a core of the Flickr interaction graph  Most photos receive most of their fans early on (during first week)  Bad news – good news  Inferred friendship graphs say little about user interaction/dynmaics  Observed concentration of “activity” is promising for measurements and studying dynamics

 Leverage the observed concentration in the user interaction graph for measurements  Characterization of other types of interactions in other OSNs  Messaging in Twitter  Video-tagging in YouTube  More detailed study of user interaction patterns and their dynamics  Multi-scale (in time and space) analysis of interaction graphs  Idea: slow (temporal) dynamics at coarse (spatial) scales  Understanding underlying causes for observed interaction patterns

Thanks! Questions ? Website Contact for code and data: Masoud Valafar