 # CS728 Lecture 5 Generative Graph Models and the Web.

## Presentation on theme: "CS728 Lecture 5 Generative Graph Models and the Web."— Presentation transcript:

CS728 Lecture 5 Generative Graph Models and the Web

Importance of Generative Models Gives insight into the graph formation process: –Anomaly detection – abnormal behavior, evolution –Predictions – predicting future from the past –Simulations and evaluation of new algorithms –Graph sampling – many real world graphs like the web are too large and complex to deal with –Goal: generating graphs with small world property, clustering, power-laws, other naturally occurring structures

Graph Models: Waxman Models Used for models of clustering in Internet-like topologies and networks with long and short edges The vertices are distributed at random in a plane. An edge is added between each pair of vertices with probability p. p(u,v) =  * exp( -d / (  *L) ), 0  ,   1. L is the maximum distance between any two nodes. Increase in alpha increases the number of edges in the graph. Increase in beta increases the number of long edges relative to short edges. d is the Euclidean distance from u to v in Waxman-1. d is a random number between [0, L] in Waxman-2.

Graph Models: Configuration Model Random Graph from given degree sequence Problem: Given a degree sequence, d1,d2, d3, …., dn generate a random graph with that degree sequence Solution: Place di stubs onto vertex I Choose pairs of stubs at random

Problem: we may construct graphs with loops and multiedges To prevent this there must be enough “absorbing” residual degree capacity. Algorithm: Maintain list of nodes sorted by residual degrees d(v) Repeat until all nodes have been chosen: –pick arbitrary vertex v –add edges from v to d(v) vertices of highest residual degree –update residual degrees To randomize further, we can start with a realization and repeatedly 2-swap pairs of edges (u,v), (s,t) to (u,t), (s,v) Works OK, But is there a more ‘natural’ generative model?

Generative Graph models: Preferential attachment Price’s Model  : Physics citations – “cummulative advantage” Herb Simon [50’s]: Nobel and Turing Awards, political scientist “rich get richer” (Pareto) Matthew effect / Matilda effect: sociology Barabasi and Albert 99: Preferential attachment: –Add a new node, create d out-links –Probability of linking a node is proportional to its current degree Simple explanation of power-law degree distributions

Issues with preferential attachment and Power-laws Barabasi model fixed constant m for out-degree Price’s model directed with m mean out-degree Probability of adding a new edge is proportional to its (in) degree k – problem at the start degree 0 – Price’s model: prop to deg + 1 –Analysis: prob a node has degree k p k ~ k -3 (Barabasi model) p k ~ k -(2+1/m) power-law with exponent 2-3 (Price) Exercise: give pseudocode that generates such a graph in linear time

Variations on the PA Theme Clustering, Small-World and Ageing Copying Model Alpha and beta Models Temporal Evolution Densification

Graph models: Copying model Copying model [Kleinberg, Kumar, Raghavan, Rajagopalan and Tomkins, 99]: –Add a node and choose the number of edges to add –Choose a random vertex and “copy” its links (neighbors) Also generates power-law degree distributions Generates communities - clustering

Graph Models: The Alpha Model Watts (1999)  model: Add edges to nodes, as in random graphs, but makes links more likely when two nodes have a common friend. For a range of  values: –The world is small (average path length is short), and –Groups tend to form (high clustering coefficient). Probability of linkage as a function of number of mutual friends (  is 0 in upper left, 1 in diagonal, and ∞ in bottom right curves.)

Graph Models: The Beta Model Watts and Strogatz (1998) “Link Rewiring”  = 0  = 0.125  = 1 People know others at random. Not clustered, but “small world” People know their neighbors, and a few distant people. Clustered and “small world” People know their neighbors. Clustered, but not a “small world”

Graph Models: The Beta Model First five random links reduce the average path length of the network by half, regardless of N! Both  and  models reproduce short-path results of random graphs, but also allow for clustering. Small-world phenomena occur at threshold between order and chaos. Watts and Strogatz (1998) Clustering coefficient / Normalized path length Clustering coefficient (C) and average path length (L) plotted against 

Other Related Work Hybrid models: Beta + Waxman on grid Huberman and Adamic, 1999: Growth dynamics of the world wide web –Argue against Barabasi model for its age dependence Kumar, Raghavan, Rajagopalan, Sivakumar and Tomkins, 1999: Stochastic models for the web graph Watts, Dodds, Newman, 2002: Identity and search in social networks Medina, Lakhina, Matta, and Byers, 2001: BRITE: An Approach to Universal Topology Generation …

Statistics Statistics of common networks: N - nodes K - degree D - distance C- clique fraction Actors225,226613.650.79 Power- grid 4,9412.6718.70.08 C.elegans282142.650.28 Large k = large c? Small c = large d?

Modeling Ageing and Temporal Evolution N(t) … nodes at time t E(t) … edges at time t Suppose that N(t+1) = 2 * N(t) Q: what is guess for E(t+1) =? 2 * E(t) A: over-doubled?

Temporal Evolution of Graphs Densification Power Law –networks appear denser over time –the number of edges grows faster than the number of nodes – average degree is increasing a … densification exponent or equivalently

Graph Densification Densification Power Law Densification exponent: 1 ≤ a ≤ 2: –a=1: linear growth – constant out- degree (assumed in the literature so far) –a=2: quadratic growth – clique Let’s see the real graphs!

Densification – ArXiv citation graph in Physics Citations among physics papers 1992: –1,293 papers, 2,717 citations 2003: –29,555 papers, 352,807 citations For each month M, create a graph of all citations up to month M N(t) E(t) 1.69

Densification – Patent Citations Citations among patents granted 1975 –334,000 nodes –676,000 edges 1999 –2.9 million nodes –16.5 million edges Each year is a datapoint N(t) E(t) 1.66

Densification – Internet Autonomous Systems Graph of Internet 1997 –3,000 nodes –10,000 edges 2000 –6,000 nodes –26,000 edges One graph per day N(t) E(t) 1.18

Evolution of the Diameter Prior work on Power Law graphs hints at Slowly growing diameter: –diameter ~ O(log N) –diameter ~ O(log log N) What is happening in real data? Diameter shrinks over time –As the network grows the distances between nodes slowly decrease

Diameter – ArXiv citation graph Citations among physics papers 1992 –2003 One graph per year time [years] diameter

Diameter – “Patents” Patent citation network 25 years of data time [years] diameter

Diameter – Autonomous Systems Graph of Internet One graph per day 1997 – 2000 number of nodes diameter

Next Time: Densification – Possible Explanations Generative models to capture the Densification Power Law and Shrinking diameters 2 proposed models: –Community Guided Attachment – obeys Densification –Forest Fire model – obeys Densification, Shrinking diameter (and Power Law degree distribution)