Presentation is loading. Please wait.

Presentation is loading. Please wait.

Basic Network Properties and the Random Graph Model

Similar presentations


Presentation on theme: "Basic Network Properties and the Random Graph Model"— Presentation transcript:

1 Basic Network Properties and the Random Graph Model
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

2 Announcement: Recitations
Review of basic probability: Today, Thu 9/29 In Gates B01, 4-6pm Review of basic linear algebra: Tomorrow, Fri 9/30 Gates B03, 4-6pm Next week: Intro to SNAP (Gates B01, 4-6pm on Thu 10/6) Intro to NetworkX (Gates B03, 4-6pm on Fri 10/7) 1/13/2019 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,

3 Structure of Networks Recall from the last lecture: This class:
1) We took a real system: the Web 2) We represented it as a directed graph 3) We used the language of graph theory Strongly Connected Components 4) We designed a computational experiment: Find In- and Out-components of a given node v 5) We learned something about the structure of the Web This class: Define basic terminology and properties that you can compute on networks v Out(v) 1/13/2019 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,

4 Undirected vs. Directed Networks
JOKE Sex network is undirected Undirected Links: undirected (symmetrical) Undirected links: Collaborations Friendship on Facebook Directed Links: directed (arcs) Directed links: Phone calls Following on Twitter A B D C L M F G H I D B C E G A F 1/13/2019 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,

5 Adjacency Matrix Aij=1 if there is a link between node i and j
4 1 2 3 4 1 2 3 Aij=1 if there is a link between node i and j Aij=0 if nodes i and j are not connected to each other Note that for a directed graph (right) the matrix is not symmetric. 1/13/2019 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,

6 Node Degrees Node degree, ki: the number of links connected to node i
j i Undirected Avg. degree: In directed networks we define an in-degree and out-degree. The (total) degree of a node is the sum of in- and out-degree. D B C Directed E G A F Source: A node with kin = 0 Sink: A node with kout = 0 1/13/2019 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,

7 Complete Graph The maximum number of edges in an undirected graph on N nodes is A graph with the number of edges E=Emax is a complete graph, and its average degree is N-1 1/13/2019 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,

8 Networks are Sparse Graphs
Most real-world networks are sparse E << Emax (or k << N-1) WWW (Stanford-Berkeley): N=319,717 k=9.65 Social networks (LinkedIn): N=6,946,668 k=8.87 Communication (MSN IM): N=242,720,596 k=11.1 Coauthorships (DBLP): N=317,080 k=6.62 Internet (AS-Skitter): N=1,719,037 k=14.91 Roads (California): N=1,957,027 k=2.82 Protein (S. Cerevisiae): N=1,870 k=2.39 (Source: Leskovec et al., Internet Mathematics, 2009) Consequence: Adjacency matrix is filled with zeros! (Density (E/N2): WWW=1.5110-5, MSN IM = 2.2710-8) 1/13/2019 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,

9 More Types of Graphs: Unweighted (undirected) Weighted (undirected) 3
1 4 2 3 1 4 2 Friendships, WWW Call graph, graph 1/13/2019 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,

10 More Types of Graphs: Self-edges (undirected) Multigraph (undirected)
3 1 4 2 3 1 4 2 WWW, Social networks, collaboration networks 1/13/2019 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,

11 Network Representations
WWW > Facebook friendships > Citation networks > Collaboration networks > Mobile phone calls > Protein Interactions > > directed multigraph with self-interactions > undirected, unweighted > unweighted directed acyclic > undirected multigraph or weighted > directed, (weighted?) multigraph > undirected, unweighted with self-interactions 1/13/2019 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,

12 Bipartite Graph Bipartite graph is a graph whose nodes can be divided into two disjoint sets U and V such that every link connects a node in U to one in V; that is, U and V are independent sets. Examples: Authors-to-papers Movies-to-Actors Users-to-Movies “Folded” networks Author collaboration networks Actor collaboration networks U V 1/13/2019 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,

13 Network Properties: How to Characterize a Network?

14 Degree Distribution Degree distribution P(k): Probability that a randomly chosen node has degree k Nk = # nodes with degree k P(k) = Nk / N ➔ plot k P(k) 0.6 0.5 0.4 0.3 0.2 0.1 1 2 3 4 1/13/2019 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,

15 Paths in a Graph A path is a sequence of nodes in which each node is linked to the next one Path can intersect itself and pass through the same edge multiple times E.g.: ACBDCDEG In a directed graph a path can only follow the direction of the “arrow” C A B D E H F G 1/13/2019 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,

16 Number of Paths Number of paths between nodes u and v :
Could do a better job ILLUSTRATION to illustrate how the stuff is really the matrix powering Number of paths between nodes u and v : Length h=1: If there is a link between u and v, Auv=1 else Auv=0 Length h=2: If there is a path of length two between u and v then Auk Akv=1 else Auk Akv=0 Length h: If there is a path of length h between u and v then Auk .... Akv=1 else Auk .... Akv=0 So, the no. of paths of length h between u and v is (holds for both directed and undirected graphs) 1/13/2019 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,

17 Distance in a Graph Distance (shortest path, geodesic) between a pair of nodes is defined as the number of edges along the shortest path connecting the nodes. *If the two nodes are disconnected, the distance is defined as infinite In directed graphs paths need to follow the direction of the arrows. Consequence: Distance is not symmetric: h(A,C) ≠ h(C,A) B A D C B A D C 1/13/2019 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,

18 Finding Shortest Paths
Breath-First Search: Start with node u, mark it to be at distance hu(u)=0, add u to the queue While queue not empty: Take node v off the queue, put it’s unmarked neighbor w into the queue and mark hu(w)=hu(v)+1 3 4 3 u 2 4 3 2 1 1 3 4 2 3 3 4 1 4 4 3 4 4 2 2 3 1/13/2019 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,

19 Network Diameter Diameter: the maximum (shortest path) distance between any pair of nodes in the graph Average path length/distance for a connected graph (component) or a strongly connected (component of a) directed graph Many times we compute the average only over the connected pairs of nodes (i.e., we ignore “infinite” paths) where hij is the distance from node i to node j 1/13/2019 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,

20 Clustering Coefficient
Consequence of Clustering: -- communities -- clustering is correlaed with network density – need a NULL model to estimate how surprised are we by the CCF. Clustering coefficient: What portion of i’s neighbors are connected? Node i with degree ki Ci  [0,1] Average Clustering Coefficient: where ei is the number of edges between the neighbors of node i i i i Ci=0 Ci=1/3 Ci=1 1/13/2019 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,

21 Clustering Coefficient
What portion of i’s neighbors are connected? Node i with degree ki where ei is the number of edges between the neighbors of node i C A B D E H F G kB=2, eB=1, CB=2/2 = 1 kD=4, eD=2, CD=4/12 = 1/3 1/13/2019 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,

22 Key Network Properties
Degree distribution: P(k) Path length: h Clustering coefficient: C 1/13/2019 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,

23 Regular Lattice: 1D P(k) = (k-4) k=4 for each node
Better explanation of the alternative calculation -- picture P(k) = (k-4) k=4 for each node C = ½ for each node if N>6 Path length: The average path-length is Constant degree, constant clustering coefficient. Alternative calculation: 1/13/2019 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,

24 Regular Lattice: 2D P(k) = (k-6) C = 6/15 for inside nodes
k=6 for each inside node C = 6/15 for inside nodes Path length: In general, for lattices: average path-length is Constant degree, constant clustering coefficient 1/13/2019 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,

25 3-way Cayley Tree Degree: C = 0 Path length: k=3 for non-leaves
Animation to use a better use of the side calculation. Explain the idea that we want to find h_max such that sum is =N Degree: k=3 for non-leaves k=1 for leaves C = 0 Path length: Distances vary logarithmically with N. Constant degree, no clustering. 1/13/2019 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,

26 Erdös-Renyi Random Graph Model

27 Simplest model of graphs?
Erdös-Renyi Random Graph [Erdös-Renyi, ‘60] Two variants: Gn,p: undirected graph on n nodes and each edge (u,v) appears i.i.d. with probability p Gn,m : undirected graph with n nodes, and m uniformly at random picked edges What kinds of networks does such model produce? 1/13/2019 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,

28 Random Graph Model n and p do not uniquely define the graph!
We can have many different realizations. How many? n = 10 p= 1/6 The probability of Gnp to form a particular graph G(N,E) is That is, each concrete graph G(N,E) appears with probability P(G(N,E)). 1/13/2019 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,

29 Random Graph Model: Edges
How many likely is a graph on E edges? P(E): the probability that a given Gnp generates a graph on exactly E edges: where Emax=n(n-1)/2 is the maximum possible number of edges Binomial distribution >>> 1/13/2019 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,

30 Node Degrees in a Random Graph
What is expected degree of a node? Let Xv be a random var. measuring the degree of the node v: Linearity of expectation: For any random variables Y1,Y2,…,Yk If Y=Y1+Y2+…Yk, then E[Y]= i E[Yi] Easier way: Decompose Xv in Xv= Xv1+Xv2+…+Xvn-1 where Xvu is a {0,1}-random variable which tells if edge (v,u) exists or not How to think about this? Prob. of node u linking to node v is p u can link (flips a coin) for all other (n-1) nodes Thus, the expected degree of node u is: p(n-1) 1/13/2019 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,

31 Degree Distribution Degree distribution of Gnp is Binomial.
Let P(k) denote a fraction of nodes with degree k: P(k) Probability of missing n-1-k edges Select k nodes from n-1 Probability of having k edges As the network size increases, the distribution becomes increasingly narrow—we are increasingly confident that the degree of a node is in the vicinity of k. 1/13/2019 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,

32 Clustering Coefficient of Gnp
Since edges in Gnp appear i.i.d with probability p Clustering coefficient of a random graph is small. For a fixed degree C decreases with the graph size N. 1/13/2019 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,

33 Side-note: Configuration Model
Assume a degree sequence k1, k2, … kN Useful for as a “null” model of networks We can compare the real network G and a “random” graph G’ which has the same degree sequence as G A C A C B D B D A B C D Randomly pair up “mini”-n0des Nodes with spokes Resulting graph 1/13/2019 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,

34 END 1/13/2019 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,

35 Random k-Regular Graphs
Assume each node has d spokes (half-edges): k=1: k=2: k=3: Randomly pair them up Graph is a set of pairs Graph is a set of cycles Arbitrarily complicated graphs 1/13/2019 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,

36 Definitions: Expansion
Graph G(V, E) has expansion α: if S V: # of edges leaving S  α min(|S|,|V\S|) Or equivalently: S V \ S 1/13/2019 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,

37 One more figure of expansion
One more figure of expansion. Basically have the Si -- >Si+1 the idea is that from a set of Si nodes there are alpha Si edges pointing out – this is really expansion. Gives lower bound on how quickly we expect to expand out when doing a BFS. 1/13/2019 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,

38 Expansion Measures Robustness
Expansion is measure of robustness: To disconnect l nodes, we need to cut  α l edges Low expansion: High expansion: Social networks: “Communities” 1/13/2019 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,

39 Expansion: k-Regular Graphs
k-regular graph (every node has degree k): Expansion is at most k (when S is 1 node) Is there a graph on n nodes (n), of fixed max deg. k, so that expansion α remains const? Examples: nn grid: k=4: α =2n/(n2/4)0 (S=n/2  n/2 square in the center) Complete binary tree: α 0 for|S|=(n/2)-1 Fact: For a random 3-regular graph on n nodes, there is some const α (α>0, independent. of n) such that w.h.p. the expansion of the graph is  α S S 1/13/2019 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,

40 Diameter of 3-Regular Rnd Graph
Fact: In a graph on n nodes with expansion α for all pairs of nodes s and t there is a path of O((log n) / α) edges connecting them. Proof: Let Sj be a set of all nodes found within j steps of BFS from s. Then: s S0 S1 S2 Expansion Edges can “collide” 1/13/2019 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,

41 Diameter of 3-Regular Rnd. Graph
Proof (continued): In how many steps of BFS we reach >n/2 nodes? Need j so that: Let’s set: Then: In O(log n) steps |Sj| grows to Θ(n). So, the diameter of G is O(log(n)/ α) s t In log(n) steps, we reach >n/2 nodes In log(n) steps, we reach >n/2 nodes Note for the numbers we care about (n>>0, α<k) 1/13/2019 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,

42 Network Properties of Gnp
Degree distribution: Path length: O(log n) Clustering coefficient: C=p=k/n 1/13/2019 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,

43 “Evolution” of the Gnp 1/13/2019
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,

44 Back to Node Degrees of Gnp
Remember, expected degree We want E[Xv] be independent of n So let: p=c/(n-1) Observation: If we build random graph Gnp with p=c/(n-1) we have many isolated nodes Why? By definition: Use substitution e 1/13/2019 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,

45 No Isolated Nodes How big do we have to make p before we are likely to have no isolated nodes? We know: P[v has degree 0] = e-c Event we are asking about is: I = some node is isolated where Iv is the event that v is isolated We have: Union bound Ai 1/13/2019 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,

46 No Isolated Nodes We just learned: P(I) = n e-c Let’s try: So if:
c = ln n then: n e-c = n e-ln n =n1/n= 1 c = 2 ln n then: n e-2 ln n = n1/n2 = 1/n So if: p = ln n then: P(I) = 1 p = 2 ln n then: P(I) = 1/n  0 as n 1/13/2019 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,

47 “Evolution” of a Random Graph
Graph structure of Gnp as p changes: Emergence of a Giant Component: avg. degree k=2E/n or p=k/(n-1) k=1-ε: all components are of size Ω(ln n) k=1+ε: 1 component of size Ω(n), others have size Ω(ln n) 1 p 1/(n-1) Giant component appears c/(n-1) Avg. deg const. Lots of isolated nodes. ln(n)/(n-1) Fewer isolated nodes. 2*ln(n)/(n-1) No isolated nodes. Empty graph Complete graph 1/13/2019 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,

48 Show Show a few phase transisitions.
Good to move quickly to social media and web appliations – connect the 6-degres of separation with MSN and edge locality examples. Show 1/13/2019 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis,


Download ppt "Basic Network Properties and the Random Graph Model"

Similar presentations


Ads by Google