COQ301 The Anatomy of the Facebook Social Graph Paper Presentation

COQ301 The Anatomy of the Facebook Social Graph Paper Presentation
Presented By: Agambeer Singh Brar 2015CH10079 Om Patel CS10241 Mayank Aneja CS10237 Yash Gautam CS10268

Authors Johan Ugander (Cornell University)
Brian Karrer (University of Michigan) Lars Backstrom Cameron Marlow All authors belonged to Facebook, Palo Alto, CA, USA. Paper published on 18 November, 2011 at arxiv.org (Cornell)

Abstract Structural study of social graph of active Facebook users
Social network nearly fully connected-99.1% individuals Six degrees of separation globally Whole graph sparse, neighbourhoods dense Common structural network characteristics similar to smaller social networks

Content Introduction Material and Method Degree Distribution
Path Lengths Component Sizes Clustering Coefficient and Degeneracy Friends of Friends Degree Correlations Site Engagement Correlation Other Mixing Patterns

Introduction Social network used to study structure of human relationships Large scale detailed social structures Individuals as vertices, relationships as edges Unifying structural properties: Homophily Clustering Small-world effect Heterogeneous distributions of friends Community structure

Source: https://medium

Source: https://en. wikipedia

Facebook Social Network
More the people, more accurate representation of relationships 721 million active users in May world population 6.9 billion - ~10% of world population Active user: Logged in atleast once in last 28 days Has atleast one friend Facebook account reliably corresponding to people Only reciprocal Facebook friendships considered

Facebook Social Network: Data
13 years old and above eligible for Facebook account 721 million individuals, 68.1 billion edges ⇒ 190 friends per user Within the US: 149 million users 260 million population 13 years or older More than half of the population on Facebook 15.9 billion edges 214 friends per user Higher adoption of facebook compared to the world

Facebook Social Network: Study
Previous: Subset of Facebook population: Social network of university students Communication patterns and activity amongst segments Sampling, crawling, etc. used to collect data for large scale network properties All did not distinguish between active and stale accounts This paper: Advance collective knowledge of social networks about relationships Realistic representation of relationships using graph algorithms and network analysis tools

Materials and Methods Calculations: Network neighbourhoods:
Hadoop cluster with 2250 machines Hadoop/Hive data analysis framework developed at Facebook Network neighbourhoods: 5000 users randomly selected using reservoir sampling 100 log-spaced neighbourhood sizes Total 5,00,000 users for analysis Component structure of network: Newman-Zipf (NZ) algorithm Single computer with 64 GB of RAM Path length calculations: HyperANF algorithm Single 24-core machine with 72 GB of RAM Average of 10 runs

Newman-Zipf (NZ) algorithm
A type of Union-Find algorithm with path compression Disjoint-set Data Structure: keeps track of set of elements partitioned into disjoint subsets Find: Determine which subset a particular element is in (to see if two elements are in same subset) Union: Join two subsets into a single subset Records component structure dynamically as edges are added Computes component structure when all edges added

HyperANF Algorithm The neighbourhood function N(t) of a graph G gives, for each t, the number of pairs of nodes <x, y> such that y is reachable from x in less that t hops. The ANF algorithm (approximate neighbourhood function) was proposed to approximate NG(t) on large graphs. HyperANF uses the HyperLogLog counters (approximate number of elements in multiset) Overdecomposition is used to exploit multi-core parallelism. Computes neighbourhood function of graphs with billions of nodes in a few hours

Degree Distribution The degree of an individual is defined as the number of friends any individual in our social map has. Moreover the degree distribution is defined as the fraction of individuals in the network who have exactly k friends. The notation here is k for the degree and pk for the degree distribution. This paper shows the results of the US and Global degree distribution computation of active facebook users. The figure is given on the next page

Observations We see that the distribution for the U.S. is quite similar to that of the entire population. So we focus on the global dist. The distribution almost monotonically decreases except for an anomaly near 20 friends(degree unit). - because facebook encourages people with low friend count to make more friends. We see a clear cutoff of 5000 friends - because facebook had imposed a limit on the number of friends at the time of these measurements. Most Individuals have a moderate degree approx 200 while this fraction falls drastically at a degree of a more hundreds or even thousands. The median friend count was found to be 99. The distribution is right skewed (mean is to the right of median) with a high variance but we see in next graph(log log distribution) has a curvature that is important.

Observations Usually such measurements of networks show that degree distributions follow the power laws represented mathematically as pk ∝k-α for some positive α Power-laws are straight lines on a log-log plot, and clearly the observed distribution is not straight. Hence power law models fail for facebook’s degree dist. We also can say that the users may not be direct friends but maybe connected to each other by more jumps. Hence we to the next topic...

Path Lengths In Network structure studies, the distribution of distances between vertices is also an important factor which helps us analyse a network. The paper describes this by characterizing the neighbourhood function and the average pairwise distances of the whole global and U.S facebook networks.

The neighbourhood function N(h) is defined as the number of pairs of vertices (u, v ) such that u is reachable from v along a path in the network with h edges or less. The diameter of the graph is hence the maximum distance between any pair of vertices in the graph. We see that the vast majority of the facebook network consist of one large connected subgraph. Following figure shows us the neighbourhood function calculated for both facebook in the U.S as well as all users..

Observations Average distance between pairs of users is 4.7 for all facebook users while 4.3 for U.S users. 92% of all pairs of facebook users were within 5 degrees and 99.6% were within 6 degrees. For the U.S, 96% were within 5 degrees while 99.7% were within 6 degrees.

Component Sizes To show that the previous path length results are representative of the entire Facebook network, component structure of the graph has to be investigated. We saw that the most of the users are within a few degrees away from each other. To confirm this we find the connected components of the facebook graphs.

Definition Connected Component is a set of individuals for which each pair of individuals are connected by at least one path through the network. The neighbourhood function previously calculated the distances between pairs of users for only connected components. The next figure shows the distribution of component sizes on log-log scales

Observations There are many connected components but most of them are small. The second largest connected-component only has just over 2000 individuals, while the the largest component covers percent of the users. This component comprises of a vast majority of the users and show that the connections exist between nearly every facebook user.

Local Clustering Coefficient - Definition
Clustering coefficient is a measure of the degree to which nodes in a graph tend to cluster together. The local clustering coefficient of a vertex (node) in a graph quantifies how close its neighbours are to being a clique (complete graph). It is the fraction of friendships between the users in the vertex induced subgraph consisting of the users who are friends with user i. Mathematically defined as To understand local clustering in more detail, we need to understand about the neighbourhood graph. The local clustering coefficient of a vertex (node) in a graph quantifies how close its neighbours are to being a clique (complete graph).

Neighbourhood graph Neighbourhood graph for user i, sometimes called the ego graph or the 1-ball, is the vertex induced subgraph consisting of the users who are friends with user i. Let lambda be the number of triangles in the undirected graph G. That is, lambda is the number of sub-graphs of G with 3 edges and 3 vertices, one of which is v. Let tau be the number of triples in the graph. That is, tau is the number of sub-graphs (not necessarily induced) with 2 edges and 3 vertices, one of which is v and such that v is incident to both edges. Then we can also define the clustering coefficient as That is, lambda is the number of sub-graphs of G with 3 edges and 3 vertices, one of which is v. That is,tau is the number of sub-graphs (not necessarily induced) with 2 edges and 3 vertices, one of which is v and such that v is incident to both edges.

Local Clustering Coefficient

Observations about local clustering coefficient
We see that the local clustering coefficient is very large regardless of the degree, compared to the percentage of possible friendships in the network as a whole. For example, for users with 100 friends, the average local clustering coefficient is 0.14, indicating that for a median user, 14% of all their friend pairs are themselves friends. This is approximately five times greater than the clustering coefficient found in a 2008 study analyzing the graph of MSN messenger correspondences, for the same neighborhood size This is approximately five times greater than the clustering coefficient found in a 2008 study analyzing the graph of MSN messenger correspondences, for the same neighborhood size

Observations about local clustering coefficient (cont.)
The analysis shows that the clustering coefficient decreases monotonically with degree, consistent with the earlier MSN messenger study and other studies. In particular, the clustering coefficient drops rapidly for users with close to 5000 friends, indicating that these users are likely using Facebook for less coherently social purposes and friending users more indiscriminately.

Friends of Friends An important property of graphs to consider when designing algorithms is the number of vertices that are within two hops of an initial vertex. The non-unique friends-of-friends count corresponds to the number of length-two paths starting at an initial vertex and not returning to that vertex. The unique friends-of-friends count corresponds to the number of unique vertices reachable at the end of a length-two path. This property determines the extent to which graph traversal algorithms, such as breadth-first search, are feasible we computed the average count of both unique and non-unique friends-of-friends as a function of degree. This property determines the extent to which graph traversal algorithms, such as breadth-first search, are feasible In Figure 5, we computed the average count of both unique and non-unique friends-of-friends as a function of degree.

Friends of friends

Observations In reality, the number of non-unique friends of friends grows only moderately faster than linear, and the number of unique friends-of-friends grows very close to linear, with a linear fit producing a slope of 355 unique friends-of-friends per additional friend. It’s important to observe from the figure that the absolute amounts are unexpectedly large: a user with with 100 friends has 27, 500 unique friends-of-friends and 40, 300 non-unique friends-of-friend.

Degree Correlations Number of friends of a person or degree of a person in Facebook depends on the number of friends his neighbours has. This has been corroborated by various studies on different social networks. The correlations relation is: Your neighbours’ degree tend to be large when your degree is large and vice versa. This is called Degree Assortativity.

Pearson Correlation Coefficient
This degree correlation can be quantified by computing the Pearson Correlation Coefficient, r, between degrees at the end of the edge. It is defined as follows - A measure of the linear correlation between two variables X and Y. It has a value between +1 and −1, where 1 is total positive linear correlation, 0 is no linear correlation, and −1 is total negative linear correlation. It can be calculated as follows - Where e(j,k) is the joint-excess degree probability for excess degree j and excess degree k (the excess degree, also known as remaining degree, of a node is equal to the degree of that node minus one) and q(k) is the normalized distribution of the excess degree Dr of a randomly selected node and σq is the standard deviation of qk.

Degree Correlation in Facebook
Degree correlation in FB is found to be, r = This is inline with results found on earlier analyses done of different social networks like academic coauthorship and actor collaboration.

Degree Correlations A more detailed measure is average number of friends for the neighbour of an individual with k friends. In Fb, this value is 300 for low degree individuals and goes upto 820 for individuals with 1000 friends. This affirms the positive assortativity of the network. For a randomly chosen edge, expected number of friends at the end is 625 which should remain constant throughout the graph.

Degree Correlation Another interesting aspect comes up while analysing these graphs. Our average neighbours have more friends than us. It is observed that 83.6% of users have less friends than the median friend count of their neighbours. It is also noted that 92.7% of users have less friends than the average friend count of their neighbours.

Site Engagement Correlation
The correlation between the login activity of users was also studied. The trends found were the same as found in degree correlation and moreover were more intensified.

As we can see the actual value is far larger than diagonal value or most of the range of logging in between times in the past 28 days. So on an average, your friends login more than you if you login upto 70% days in a month. This shows a positive correlation in site engagement which implies higher user activity is maybe induced by the activity if friends.

Here the login activity is plotted against the user’s degree. It is found that there exists a correlation between the number of friends of a user and his/her login activity. And this trend exists not only for the mean user (given by solid line) but also the 25/75 and 5/95 percentile users.

Reasoning Such trends can be explained by analysing the working of facebook feed. A person’s feed consists of updates, links, pictures, videos and other content of his/her friends. If a person’s neighbours are more active, he has more content on his feed and therefore all the more reason to login. And if he has more friends, still his feed would be full of content and therefore increase his login activity. Thus you can waste your friends’ time just by wasting yours :D

Other mixing patterns Then many other traits were used to characterize user behaviour and friendship on Fb. Those include - Age Gender Country of Origin The data collected was used to characterize their homophily.

Age Correlation We start by considering friendship patterns amongst individuals with different ages, and compute the conditional probability p(t′|t) of selecting a random neighbor of individuals with age t who has age t′. Again, random neighbor means that each edge connected to a vertex with age t is given equal probability of being followed. The resulting distribution is centred about t = t’ and asymmetric about it. Probability of friendship with older people falls exponentially from the mode.

Age Correlation

Gender Correlation They have computed the conditional probability p(g′|g) that a random neighbor of individuals with gender g has gender g′ where we denote male by M and female by F. For friends of male users, it is found p(F|M) = and p(M|M) = For female users, p(F|F) = and p(M|F) = In both cases it is seen that a random neighbour is more likely to be female. Though the number of active female users is less the male but the average degree of female user is more than male.

Geo-based Correlation
Next correlation between geographic location found using IP addresses and friendship is found. It is found that 84.2% of edges are within countries, so the network divides fairly cleanly along country lines. The graph on next page is a heatmap of edges between countries with atleast a million active users and more than 50% of the population internet enabled.

Geo based Correlation Normalized Country Adjacency Matrix

Interesting Revelations
Many of the close groups were also found based on the same continents like South America, Africa, Europe. More curious grouping was on the basis of shared history and culture like that of UK, Ghana and South Africa.

Thank You !

COQ301 The Anatomy of the Facebook Social Graph Paper Presentation

Similar presentations

Presentation on theme: "COQ301 The Anatomy of the Facebook Social Graph Paper Presentation"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

COQ301 The Anatomy of the Facebook Social Graph Paper Presentation

Similar presentations

Presentation on theme: "COQ301 The Anatomy of the Facebook Social Graph Paper Presentation"— Presentation transcript:

Similar presentations

About project

Feedback