Graph Theory Spectral Graph Theory Markov Chain Theory Fan Chung 金芳蓉 UCSD
A.-L. Barabási Northeastern Mark Newman Michigan Mean-Field Theory Phase Transition Percolation Theory
Jon Kleinberg Cornell
Liu Y Y, Slotine J J, Barabási A L. Nature, 2011, 473(7346):
Sinan Aral MIT
Lev Muchnik, Sinan Aral, and Sean J. Taylor, Social Influence Bias: A Randomized Experiment, Science 9 August 2013: 341 (6146), 大众点评的餐厅评价豆瓣的电影评价购物网站的商品评价差评师的影响有多大 事先为 篇网上文章随机给好评或差评，并与对照组相比 事先给好评有引导性：读者给好评的可能性提高了 32% 事先给差评没有引导性：对文章最后的评分几乎没有影响
99.9% 32% Facebook Friendship network: Facebook Love network:
Many networks have a unique giant component
If you have 2 large-components each occupying roughly 1/2 of the graph. How many random edges do you need to add so that the probability that the two components join into one giant component is greater than 0.9? (a) 1-5 edge additions (b) 6-10 edge additions (c) edge additions (d) edge additions
Globally coupled net ： k=N-1, M~O(N 2 ) Practical net ： << N-1, M << O(N 2 )
Sparse real net ： << N-1, M << O(N 2 ) Facebook ： N=7.21 亿, M=687 亿, =190 ， =0.3* WWW (ND Sample): N=325,729; M= M max =10 12 =4.51 Protein (S. Cerevisiae): N=1,870; M=4,470M max =10 7 =2.39 Coauthorship (Math): N= 70,975; M= M max = =3.9 Movie Actors: N=212,250; M= M max = =28.78 (Source: Albert, Barabasi, RMP2002)
Dense ： tends to a nonzero constant （ N ) Sparse ： tends to zero （ N )
N(t) E(t) 1.18 N(t) E(t) 1.15 Autonomous Systems Affiliation Network E(t) ~ N(t) a 1
Many networks densify over time, but are still sparse
Shortest Path, Distance d ij Diameter D Average Path Length (Characteristic Path Length) (Average Distance)
Method 1: Only consider giant component Disconnected: L Disconnected: L
Method 2: Only consider connected pairs Disconnected: L Disconnected: L
Method 3 ： Compute“Harmonic Mean” ： Disconnected: L Disconnected: L
In most real networks, there are small distances between two randomly selected nodes.
Your clustering coefficient: the probability that two randomly selected friends of U are friends with each other E i : no. of edges among your k i friends C i measures network’s local density: the more densely interconnected the neighborhood of node i, the higher is C i. Network clustering coefficient
Many real networks have a much higher clustering coefficient than expected for a completely random network of same no. of nodes and links High-degree nodes tend to have a smaller clustering coefficient than low-degree nodes.
Radicchi F., et al. Defining and identifying communities in networks. Proc. Natl Acad. Sci. USA 2004;101: The number of triangles to which a given edge belongs, divided by the number of triangles that might potentially include it, given the degrees of the adjacent nodes.
S Pajevic, D Plenz. The organization of strong links in complex networks. Nature Physics, 2012, March the relative neighbourhood overlap n C is the number of common neighbours n T is the total number of neighbouring nodes, excluding the end nodes
P(k) : The probability that the degree of a randomly selected node is k the fraction of nodes in the network with degree k
Many real networks are scale-free in the sense that the degree distribution deviates significantly from the Poisson distribution
身高服从正态分布：平均身高是有 意义的特征 财富服从无标度分布：平均财富并 非有意义特征！ $50 billion After Bill enters the arena the average income of the public ~ 1,000,000
High skew (asymmetry) Straight line on a log-log plot
Least square regression Some data exhibit a power law only in the tail So need to select a k min (>0) where the power-law starts
Least square regression Noise in the tail skews the regression result Will give values of the exponent α that are too low
but could also be a lognormal or double exponential… Not every network is power law distributed
A. Clauset, C.R. Shalizi, and M.E.J. Newman, SIAM Review 51(4), (2009) Commonly used methods for analyzing power-law data, such as least-squares fitting, can produce substantially inaccurate estimates of parameters for power-law distributions, and even in cases where such methods return accurate answers they are still unsatisfactory because they give no indication of whether the data obey a power law at all. Here we present a principled statistical framework for discerning and quantifying power-law behavior in empirical data. Our approach combines maximum-likelihood fitting methods with goodness-of-fit tests based on the Kolmogorov–Smirnov (KS) statistic and likelihood ratios. For implementation codes, see:
Given p(x), If, for any given constant a, there is a constant g(a) s.t. p(ax) = g(a) p(x) (scale-free), then there are constants C and r s.t. p(x)=C x -r (power-law).