CS 728 Lecture 4 It's a Small World on the Web.

1 CS 728 Lecture 4 It’s a Small World on the Web

2 Small World Networks It is a ‘small world’ after all –Billions of people on Earth, yet every pair separated by “six degrees” of acquaintance relationships –Notion popularized by experimental psychologist Stanley Milgram’s, different from his more infamous experiment Mathematically –Sparse – linear number of edges –Diameter - small like logarithm (log N) –Clustering is high – neighbors are neighbors

3 Small World = Small Diameter + Clustering Defined by two measures: –characteristic path length L = number of edges in shortest path between two vertices, averaged over all vertex pairs –clustering coefficient C: take vertex v with k  1 neighbors at most k(k-1)/2 edges among neighbors C(v) = fraction of k(k-1)/2 edges present C = average clustering coefficient C >> C_random, L  L_random

4 The small world of the Web Empirical study of Web-graph reveals small-world property –Sparse graph –Average distance (d) in simulated web: d = 0.35 + 2.06 log (n) e.g. n = 10 9, d ~= 19 –Diameter properties inferred from sampling Calculation of max. diameter computationally demanding for large values of n –Clustering unknown

5 Implications for Web Logarithmic scaling of diameter makes future navigation of web manageable –10-fold increase of web pages results in only 2 more additional ‘clicks’, but … –Users may not take shortest path, may use bookmarks or just get distracted on the way –Search engines play a crucial role, how can they use this SW link structure?

6 Small World in Real World of Hollywood: The Kevin Bacon Game Goal: Connect any actor to Kevin Bacon, by linking actors who have acted in the same movie. Oracle of Bacon website uses Internet Movie Database ( to find shortest link between any two actors. Created by students at Univ. of Virginia Boxed version of the Kevin Bacon Game


8 The Hollywood Network Total # of actors in database: ~550,000 Most actors are within three links of each other! Average path length to Kevin Bacon: 2.79 Actor closest to “center”: Rod Steiger (2.53) Rank of Kevin, in closeness to center: 876 th Center of Hollywood?

9 Math Citation Network: Erdős Number Number of links required to connect scholars to Erdős, via co- authorship of papers Erdős wrote 1500+ papers with 507 co-authors. Jerry Grossman’s (Oakland Univ.) website allows mathematicians to compute their Erdos numbers: Connecting path lengths, among mathematicians only: –average is 4.65 –maximum is 13 Paul Erdős (1913-1996)

10 My number is 3 - Erdős and Renyi showed that average path length between connected nodes in a random graph is logarithmic - But degree sequences in social networks like Web and Hollywood are not Poisson - Back to Power-laws Erdős Arny Rosenberg Fred Annexstein Fan Chung

11 Classes of small-world networks –Single-scale: Connectivity distribution decays exponentially (e.g., Poisson and random graphs) –Scale-free: Power-law distribution of connectivity over entire range –Broad-scale: Power-law over “broad range” + abrupt cut-off

12 Bow-tie Structure of Web A large scale study (Altavista crawls) reveals another interesting property of web – “symmetric asymmetry” –Study of 200 million nodes & 1.5 billion links –Small-world property not applicable to entire web Some parts unreachable Others have long paths –Power-law connectivity holds though Page indegree (  = 2.1), outdegree (  = 2.72)

13 Bow-tie Components Strongly Connected Component (SCC) –Core with small-world property Upstream (IN) –Core can’t reach IN Downstream (OUT) –OUT can’t reach core Disconnected (Tendrils)

14 Component Properties Each component is roughly same size –~50 million nodes Tendrils not connected to SCC – But reachable from IN and can reach OUT Tubes: directed paths IN->Tendrils->OUT Disconnected components –Maximal and average diameter is infinite

15 Empirical Numbers for Bow-tie Maximal minimal (?) diameter – 28 for SCC, 500 for entire graph Probability of a path between any 2 nodes –~1 quarter (0.24) Average length –16 (directed path exists), 7 (undirected) Shortest directed path between 2 nodes in SCC: 16-20 links on average

16 Next Time: Models for the Web Graph Stochastic models that can explain or at least partially reproduce the properties of the web graph. Goals of model – power law distribution properties – maintain the small world property – bow-tie structure

