Lecture 4: Network Measures CS 765: Complex Networks

Lecture 4: Network Measures CS 765: Complex Networks
Slides are modified from Networks: Theory and Application by Lada Adamic

Characterizing networks: Is everything connected?

Network metrics: components
If there is a path from every vertex in a network to every other, the network is connected otherwise, it is disconnected Component: A subset of vertices such that there exist at least one path from each member of the subset to others and there does not exist another vertex in the network which is connected to any vertex in the subset Maximal subset A singeleton vertex that is not connected to any other forms a size one component Every vertex belongs to exactly one component

Connectedness

network metrics: size of giant component
if the largest component encompasses a significant fraction of the graph, it is called the giant component

components in directed networks
Weakly connected components: every node can be reached from every other node by following links in either direction A B C D E F G H Weakly connected components A B C D E G H F Strongly connected components Each node within the component can be reached from every other node in the component by following directed links A B C D E F G H Strongly connected components B C D E A G H F

components in directed networks
Every strongly connected component of more than one vertex has at least one cycle Out-component: set of all vertices that are reachable via directed paths starting at a specific vertex v Out-components of all members of a strongly connected component are identical In-component: set of all vertices from which there is a direct path to a vertex v In-components of all members of a strongly connected component are identical A B C D E F G H

The Web is a directed graph:
bowtie model of the web The Web is a directed graph: webpages link to other webpages The connected components tell us what set of pages can be reached from any other just by surfing no ‘jumping’ around by typing in a URL or using a search engine Broder et al – crawl of over 200 million pages and 1.5 billion links. SCC – 27.5% IN and OUT – 21.5% Tendrils and tubes – 21.5% Disconnected – 8%

Characterizing networks: How far apart are things?

Network metrics: paths
A path is any sequence of vertices such that every consecutive pair of vertices in the sequence is connected by an edge in the network. For directed: traversed in the correct direction for the edges. path can visit itself (vertex or edge) more than once Self-avoiding paths do not intersect themselves. Path length r is the number of edges on the path Called hops

The length of the path is n.
Paths A path between nodes i0 and in is an ordered list of n links P = {(i0, i1), (i1, i2), (i2, i3), ... ,(in-1, in)}. The length of the path is n. The path shown in orange in (a) follows the route 1→2→5→7→4→6, hence its length is n = 5 The shortest paths between nodes 1 and 7, or the distance d17, correspond to the path with the fewest number of links that connect nodes 1 to 7. There can be multiple paths of the same length, as illustrated by the two paths shown in orange and grey. The network diameter is the largest distance in the network being dmax = 3 here.

Edge independent paths: if they share no common edge
Vertex independent paths: if they share no common vertex except start and end vertices Vertex-independent => Edge-independent Also called disjoint paths These set of paths are not necessarily unique Connectivity of vertices: the maximal number of independent paths between a pair of vertices Used to identify bottlenecks and resiliency to failures

Cut Sets and Maximum Flow
A minimum cut set is the smallest cut set that will disconnect a specified pair of vertices Need not to be unique Menger’s theorem: If there is no cut set of size less than n between a pair of vertices, then there are at least n independent paths between the same vertices Implies that the size of min cut set is equal to maximum number of independent paths for both edge and vertex independence Maximum Flow between a pair of vertices is the number of edge independent paths times the edge capacity

Network metrics: paths

Network metrics: shortest paths
A B C D E 1 2 3 3

Structural metrics: Average path length
1 ≤ L ≤ D ≤ N-1

shortest paths <d> = log(N) prediction: <d> = 17.5 for 200 million nodes actual: <d> = 16 for reachable pairs

Paths path Shortest path Diameter Average path length Cycle
Eulerian path Hamiltonian path

Node degree 1 Outdegree = 1 Indegree = A =
2 Node degree 3 1 4 1 5 Outdegree = A = example: outdegree for node 3 is 2, which we obtain by summing the number of non-zero entries in the 3rd row 1 Indegree = A = example: the indegree for node 3 is 1, which we obtain by summing the number of non-zero entries in the 3rd column

Node degrees

Degree sequence and Degree frequency
Degree sequence: An ordered list of the (in,out) degree of each node In-degree sequence: [2, 2, 2, 1, 1, 1, 1, 0] Out-degree sequence: [2, 2, 2, 2, 1, 1, 1, 0] (undirected) degree sequence: [3, 3, 3, 2, 2, 1, 1, 1] Degree frequency: A frequency count of the occurrence of each degree In-degree frequency: [(2,3) (1,4) (0,1)] Out-degree frequency : [(2,4) (1,3) (0,1)] (undirected) frequency : [(3,3) (2,2) (1,3)]

Degree distribution The degree distribution is a function P(k), which gives the probability of a randomly chosen node from the graph having degree k In-degree 1 2 3 Frequency 4 Distribution 0.13 0.50 0.38 0.00 Out-degree 1 2 3 Frequency 4 Distribution 0.13 0.38 0.50 0.00 Degree 1 2 3 Frequency Distribution 0.00 0.38 0.25

Structural Metrics: Degree distribution

Degree Distribution Plot
The 𝑥-axis represents the degree and the 𝑦-axis represents the fraction of nodes having that degree On social networking sites There exist many users with few connections and there exist a handful of users with very large numbers of friends. Facebook Degree Distribution

Degree distributions Imagine I have a graph with 1000 nodes, but no links. Now I start adding links randomly, one by one. After 10 random additions, what do you expect the degree distribution to be? What will the average node degree be after 1000 additions? The standard situation in a network where links are added completely at random. If there are n nodes, and m edges randomly added, then the peak of this is at 2m/n, the average degree. For a randomly picked node, the most likely degree is the average one. The probabilities then drop quickly either side.

Degree Distributions Protein interactions of yeast

degree distribution indegree, a ~ 2.1 outdegree, a ~ 2.4
source: Pennock et al.: Winners don't take all: Characterizing the competition for links on the web PNAS April 16, 2002 vol. 99 no

Characterizing networks: How dense are they?

network metrics: graph density
Of the connections that may exist between n nodes directed graph Lmax = n*(n-1) undirected graph Lmax = n*(n-1)/2 What fraction are present? density = L / Lmax In real networks L << Lmax For example, out of 12 possible connections, this graph has 7, giving it a density of 7/12 = 0.583

As n → ∞, a graph whose density reaches
Graph density Would this measure be useful for comparing networks of different sizes (different numbers of nodes)? As n → ∞, a graph whose density reaches 0 is a sparse graph a constant is a dense graph

Transitivity  is said to be transitive if a  b and b  c together imply a  c Perfect transitivity in network → cliques Partial transitivity u knows v and v knows w → 𝐶= 𝑐𝑙𝑜𝑠𝑒𝑑 𝑝𝑎𝑡ℎ𝑠 𝑜𝑓 𝑙𝑒𝑛𝑔𝑡ℎ 𝑡𝑤𝑜 𝑝𝑎𝑡ℎ𝑠 𝑜𝑓 𝑙𝑒𝑛𝑔𝑡ℎ 𝑡𝑤𝑜 = 3 𝑡𝑟𝑖𝑎𝑛𝑔𝑙𝑒𝑠 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑒𝑑 𝑡𝑟𝑖𝑝𝑙𝑒𝑡𝑠

Local Clustering Coefficient
Clustering coefficient measures transitivity in undirected graphs Local clustering coefficient measures transitivity at the node level Commonly employed for undirected graphs Computes how strongly neighbors of a node 𝑣 (nodes adjacent to 𝑣) are themselves connected In an undirected graph, the denominator can be rewritten as:

Local Clustering Coefficient: Example
Thin lines depict connections to neighbors Dashed lines are the missing connections among neighbors Solid lines indicate connected neighbors When all neighbors are connected 𝐶=1 When none of neighbors are connected 𝐶=0

Clustering Coefficient

Structural Metrics: Clustering coefficient

Clustering Coefficient and Triples
Triple: an ordered set of three nodes, connected by two (open triple) edges or three edges (closed triple) A triangle can miss any of its three edges A triangle has 3 Triples 𝑣 𝑖 𝑣 𝑗 𝑣 𝑘 and 𝑣 𝑗 𝑣 𝑘 𝑣 𝑖 are different triples The same members First missing edge 𝑒(𝑣 𝑘 ,𝑣 𝑖 ) and second missing 𝑒(𝑣 𝑖 ,𝑣 𝑗 ) 𝑣 𝑖 𝑣 𝑗 𝑣 𝑘 and 𝑣 𝑘 𝑣 𝑗 𝑣 𝑖 are the same triple

[Global] Clustering Coefficient
Count paths of length two and check whether the third edge exists When counting triangles, since every triangle has 6 closed paths of length 2 Or we can rewrite it as

[Global] Clustering Coefficient: Example
Average clustering coefficient and global clustering coefficient are different In some extreme cases they could differ considerably

Clustering

clustering coefficient ~ 0.11 (at the site level)
clustering & motifs clustering coefficient ~ 0.11 (at the site level) Source: Milo et al., “Superfamilies of evolved and designed networks”, Science 303 (5663), p , 2004.

Local Clustering and Redundancy
𝐶 𝑖 = 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑒𝑑 𝑝𝑎𝑖𝑟𝑠 𝑜𝑓 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑟𝑠 𝑜𝑓 𝑖 𝑝𝑎𝑖𝑟𝑠 𝑜𝑓 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑟𝑠 𝑜𝑓 𝑖 𝐶 𝑊𝑆 = 1 𝑛 𝑖=1 𝑛 𝐶 𝑖 Redundancy 𝐶 𝑖 = 𝑅 𝑖 𝑘 𝑖 −1 𝑅 𝑖 = 𝐶 𝑖 ( 𝑘 𝑖 −1)

If you become my friend, I’ll be yours
Reciprocity If you become my friend, I’ll be yours Reciprocity is simplified version of transitivity It considers closed loops of length 2 If node 𝑣 is connected to node 𝑢, 𝑢 by connecting to 𝑣, exhibits reciprocity

How likely is it that the node you point to will point to you as well.
Reciprocity How likely is it that the node you point to will point to you as well. 𝑟= 1 𝑚 𝑖𝑗 𝑛 𝐴 𝑖𝑗 𝐴 𝑗𝑖 = 1 𝑚 Tr 𝐴 2 Trace of a matrix is the sum of diagonal elements

Reciprocity: Example Reciprocal nodes: 𝑣1, 𝑣2

Cocitation and Bibliographic coupling
Cocitation of two vertices i and j is the number of vertices that have outgoing edges to both 𝐶 𝑖𝑗 = 𝑘=1 𝑛 𝐴 𝑖𝑘 𝐴 𝑗𝑘 = 𝑘=1 𝑛 𝐴 𝑖𝑘 𝐴 𝑘𝑗 𝑇 𝐶=𝐴 𝐴 𝑇 Bibliographic coupling is the number of vertices to which both point 𝐵= 𝑘=1 𝑛 𝐴 𝑘𝑖 𝐴 𝑘𝑗 = 𝑘=1 𝑛 𝐴 𝑖𝑘 𝑇 𝐴 𝑘𝑗 𝐵= 𝐴 𝑇 𝐴

Signed Edges and Structural balance
Friends / Enemies Friend of friend → Enemy of my enemy → Structural balance: only loops of even number of “negative links” Structurally balanced → partitioned into groups where internal links are positive and between group links are negative

Triangle of nodes 𝑖, 𝑗, and 𝑘, is balanced, if and only if
Social Balance Theory Consistency in friend/foe relationships among individuals Informally, friend/foe relationships are consistent when In the network Positive edges demonstrate friendships (𝑤𝑖𝑗=1) Negative edges demonstrate being enemies (𝑤𝑖𝑗=−1) Triangle of nodes 𝑖, 𝑗, and 𝑘, is balanced, if and only if 𝑤𝑖𝑗 denotes the value of the edge between nodes 𝑖 and 𝑗

Social Balance Theory: Possible Combinations
For any cycle if the multiplication of edge values become positive, then the cycle is socially balanced

Keith Collins, Loubna Mrie - Quartz

What interaction patterns are common?
Similarity What interaction patterns are common? Reciprocity and Transitivity Balance and Status Who are the like-minded users and how can we find these similar individuals? Similarity Who are the central figures (influential nodes) in the network? Centrality

Structural Equivalence
We look at the neighborhood shared by two nodes; The size of this shared neighborhood defines how similar two nodes are. Example: Two brothers have in common sisters, mother, father, grandparents, etc. This shows that they are similar

Structural Equivalence: Definitions
Vertex similarity: The neighborhood 𝑁(𝑣) often excludes the node itself 𝑣 Issue? Connected nodes not sharing a neighbor will be assigned zero similarity Solution: We can assume nodes are included in their neighborhoods Normalize? Jaccard Similarity: Cosine Similarity:

Similarity: Example

Similarity Significance
Measuring Similarity Significance: compare the calculated similarity value with its expected value where vertices pick their neighbors at random For vertices 𝑣 𝑖 and 𝑣 𝑗 with degrees 𝑑𝑖 and 𝑑𝑗 this expectation is 𝑑𝑖𝑑𝑗/𝑛 There is a 𝑑𝑖/𝑛 chance of becoming 𝑣 𝑖 ‘s neighbor 𝑣 𝑗 selects 𝑑𝑗 neighbors We can rewrite neighborhood overlap as

Normalized Similarity, cont.

Normalized Similarity, cont.
𝒏 times the Covariance between 𝑨𝒊 and 𝑨𝒋 Normalize the covariance by the multiplication of Variances We get Pearson correlation coefficient (range of   [-1,1] )

Structural Equivalence: share many of the same neighbors
Similarity Structural Equivalence: share many of the same neighbors Jaccard Similarity: 𝜎 𝑖𝑗 = 𝑛 𝑖𝑗 | 𝑛 𝑖 ∪ 𝑛 𝑗 | Cosine Similarity: 𝜎 𝑖𝑗 = 𝑛 𝑖𝑗 𝑘 𝑖 𝑘 𝑗 Pearson Coefficient: Given degree of two nodes, how many common neighbors they have ( 𝑟 𝑖𝑗 ) Euclidian Distance: 𝑑 𝑖𝑗 = 𝑘 ( 𝐴 𝑖𝑘 − 𝐴 𝑗𝑘 ) 2 Regular Equivalence: neighbors are the same Katz Similarity: 𝜎 𝑖𝑗 =𝛼 𝑘𝑙 𝐴 𝑖𝑘 𝐴 𝑗𝑙 𝜎 𝑘𝑙 𝝈=𝛼𝑨𝝈+𝑰

In regular equivalence,
We do not look at neighborhoods shared between individuals, but How neighborhoods themselves are similar Example: Athletes are similar not because they know each other in person, but since they know similar individuals, such as coaches, trainers, other players, etc.

Regular Equivalence 𝑣𝑖, 𝑣𝑗 are similar when their neighbors 𝑣𝑘 and 𝑣𝑙 are similar

𝑣𝑖 and 𝑣𝑗 are similar when 𝑣𝑗 is similar to 𝑣𝑖’s neighbors 𝑣𝑘
Regular Equivalence 𝑣𝑖 and 𝑣𝑗 are similar when 𝑣𝑗 is similar to 𝑣𝑖’s neighbors 𝑣𝑘 In vector format A vertex is highly similar to itself, we guarantee this by adding an identity matrix to the equation W𝐡𝐞𝐧 𝛼<𝟏/ 𝝀 𝒎𝒂𝒙 the matrix is invertible

Regular Equivalence: Example
The largest eigenvalue of 𝐴 is 2.43 Set 𝛼 = 0.3 < 1/2.43 Any row/column of this matrix shows the similarity to other vertices Vertex 1 is most similar (other than itself) to vertices 2 and 3 Nodes 2 and 3 have the highest similarity (regular equivalence)

Homophily and Assortative Mixing
Assortativity: Tendency to be linked with nodes that are similar in some way Humans: age, race, nationality, language, income, education level, etc. Citations: similar fields than others Web-pages: Language Disassortativity: Tendency to be linked with nodes that are different in some way Network providers: End users vs other providers Assortative mixing can be based on Enumerative characteristic Scalar characteristic

Assortativity: An Example
The friendship network in a US high school in 1994 Colors represent races, White: whites Grey: blacks Light Grey: hispanics Black: others High assortativity between individuals of the same race

Assortativity Significance
The difference between measured assortativity and expected assortativity The higher this difference, the more significant the assortativity observed Example In a school, half the population is white and the other half is Hispanic. We expected 50% of the connections to be between members of different races. If all connections are between members of different races, then we have a significant finding

Modularity (enumerative)
Extend to which a node is connected to a like in network + if there are more edges between nodes of the same type than expected value - otherwise 𝑄= 1 2𝑚 𝑖𝑗 𝐴 𝑖𝑗 − 𝑘 𝑖 𝑘 𝑗 2𝑚 𝛿 𝑐 𝑖 , 𝑐 𝑗 𝛿 𝑐 𝑖 , 𝑐 𝑗 is 1 if ci and cj are of same type, and 0 otherwise 𝑄= 𝑟 𝑒 𝑟𝑟 − 𝑎 𝑟 2 err is fraction of edges that join same type of vertices ar is fraction of ends of edges attached to vertices type r

Assortative coefficient (enumerative)
Modularity is almost always less than 1, hence we can normalize it with the Qmax value 𝑟= 𝑄 𝑄 𝑚𝑎𝑥 = 𝑖𝑗 𝐴 𝑖𝑗 − 𝑘 𝑖 𝑘 𝑗 2𝑚 𝛿 𝑐 𝑖 , 𝑐 𝑗 2𝑚 − 𝑖𝑗 𝑘 𝑖 𝑘 𝑗 2𝑚 𝛿 𝑐 𝑖 , 𝑐 𝑗

Assortative coefficient (scalar)
𝑟= 𝑖𝑗 𝐴 𝑖𝑗 − 𝑘 𝑖 𝑘 𝑗 2𝑚 𝑥 𝑖 . 𝑥 𝑗 𝑖𝑗 𝑘 𝑖 𝛿 𝑖𝑗 − 𝑘 𝑖 𝑘 𝑗 2𝑚 𝑥 𝑖 . 𝑥 𝑗 r=1, perfectly assortative r=-1, perfectly disassortative r=0, non-assortative Usually node degree is used as scale 𝑟= 𝑖𝑗 𝐴 𝑖𝑗 − 𝑘 𝑖 𝑘 𝑗 2𝑚 𝑘 𝑖 . 𝑘 𝑗 𝑖𝑗 𝑘 𝑖 𝛿 𝑖𝑗 − 𝑘 𝑖 𝑘 𝑗 2𝑚 𝑘 𝑖 . 𝑘 𝑗

Modularity Example The number of edges between nodes of the same color is less than the expected number of edges between them

Assortativity Coefficient of Various Networks
M.E.J. Newman. Assortative mixing in networks

Measuring Assortativity for Ordinal Attributes
A common measure for analyzing the relationship between ordinal values is covariance It describes how two variables change together In our case, we have a network We are interested in how values assigned to nodes that are connected (via edges) are correlated

Covariance Variables The value assigned to node 𝑣𝑖 is 𝑥𝑖 We construct two variables 𝑋𝐿 and 𝑋𝑅 For any edge (𝑣𝑖,𝑣𝑗), we assume that 𝑥𝑖 is observed from variable 𝑋𝐿 and 𝑥𝑗 is observed from variable 𝑋𝑅 𝑋𝐿 represents the ordinal values associated with the left-node (the first node) of the edges and 𝑋𝑅 represents the values associated with the right-node (the second node) of the edges We need to compute the covariance between variables 𝑋𝐿 and 𝑋𝑅

Covariance Variables: Example
List of edges: (A, C) (C, A) (C, B) (B, C) 𝑋𝐿 : (18, 21, 21, 20) 𝑋𝑅 : (21, 18, 20, 21)

Normalizing Covariance
Pearson correlation 𝜌(𝑋,𝑌) is the normalized version of covariance In our case: \sigma = E(X-E(X))^2

Correlation Example

Several other graph metrics
Measures and Metrics Knowing the structure of a network, we can calculate various useful quantities or measures that capture particular features of the network topology. basis of most of such measures are from social network analysis So far, Average path length, Diameter, Degree distribution, Density, Assortativity, Connectedness, Clustering coefficient, Centrality Degree, Eigenvector, Katz, PageRank, Hubs, Closeness, Betweenness, …. Several other graph metrics

Outline Network metrics can help us characterize networks This has is roots in graph theory Today there are many network analysis tools to choose from

Lecture 4: Network Measures CS 765: Complex Networks

Similar presentations

Presentation on theme: "Lecture 4: Network Measures CS 765: Complex Networks"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 4: Network Measures CS 765: Complex Networks

Similar presentations

Presentation on theme: "Lecture 4: Network Measures CS 765: Complex Networks"— Presentation transcript:

Similar presentations

About project

Feedback