Presentation is loading. Please wait.

Presentation is loading. Please wait.

Xiaofan Wang 2014 Network Science: An Introduction.

Similar presentations


Presentation on theme: "Xiaofan Wang 2014 Network Science: An Introduction."— Presentation transcript:

1 Xiaofan Wang 2014 Network Science: An Introduction

2

3 Network Science Measuring Data Measuring Data Discovering Property Modeling Network Modeling Network Analysis Behavior Analysis Behavior Design Performance

4 Collect enough information so that you can Describe (correctly) Quantify (properly) Formulate (mathematically ) Predict (reasonably) Control (powerfully)

5 Communication Transportation Power Grid Social Economical NetworkSpreading Biological

6 Virus Fasion Behavior Rumor Belief Spreading on Social Networks (video) Spreading on Social Networks (video) Opinion 肥胖 自杀

7 Theoretical Science Graph Theory Game Theory Statistical Physics Computer Science Applied Science Communication Science Power Engineering Life Science Social Science

8 For every technology, the first ten years is the development, and the second ten years is when the market follows.

9 Science, in general, is a lot better at breaking complex things into tiny parts than it is at figuring out how tiny parts turn into complex things.

10 Interdisciplinary nature Data-driven nature Quantitative nature Computational nature

11 Interdisciplinary nature Data-driven nature Quantitative nature Computational nature

12 Graph Theory Spectral Graph Theory Markov Chain Theory Fan Chung 金芳蓉 UCSD

13 A.-L. Barabási Northeastern Mark Newman Michigan Mean-Field Theory Phase Transition Percolation Theory

14 Jon Kleinberg Cornell

15 Liu Y Y, Slotine J J, Barabási A L. Nature, 2011, 473(7346):

16 Sinan Aral MIT

17 Lev Muchnik, Sinan Aral, and Sean J. Taylor, Social Influence Bias: A Randomized Experiment, Science 9 August 2013: 341 (6146), 大众点评的餐厅评价豆瓣的电影评价购物网站的商品评价差评师的影响有多大 事先为 篇网上文章随机给好评或差评,并与对照组相比 事先给好评有引导性:读者给好评的可能性提高了 32% 事先给差评没有引导性:对文章最后的评分几乎没有影响

18

19

20

21 Unweighted Undirected Weighted Undirected Unweighted Directed Weighted Directed

22 节点数: 101 边数: 144 节点数: 101 边数: 242

23  你要基于电子邮件记录研究交 大师生之间的社会关系网络  在生成网络时需要考虑哪些因 素?如何确定两个节点之间是 否有连边?

24 No Multi-edge No Self-edge

25

26 99.9% 32% Facebook Friendship network: Facebook Love network:

27 Many networks have a unique giant component

28 If you have 2 large-components each occupying roughly 1/2 of the graph. How many random edges do you need to add so that the probability that the two components join into one giant component is greater than 0.9? (a) 1-5 edge additions (b) 6-10 edge additions (c) edge additions (d) edge additions

29 弱连通巨片 (Giant weakly connected component, GWCC) 具有 Bow-tie structure  强连通核 (SCC)  入部 (IN)  出部 (OUT)  卷须部 (TENDRILS)

30 弱连通巨片 (Giant weakly connected component, GWCC) 具有 Bow-tie structure  强连通核 (SCC)  入部 (IN)  出部 (OUT)  卷须部 (TENDRILS)

31  Average degree

32 Out-Degree In-Degree

33 对单个节点不成立的性质 却对整个系统成立! Whole net : out-degree equals to in-degree Single node : out-degree may not equal to in-degree

34 冤有头来债有主 粉丝数目不靠谱 围脖上所有关注的人数 = ? 围脖上所有的粉丝数 汪小帆老师 349 关注 粉丝

35 Globally coupled net : k=N-1, M~O(N 2 ) Practical net : << N-1, M << O(N 2 )

36 Sparse real net : << N-1, M << O(N 2 ) Facebook : N=7.21 亿, M=687 亿, =190 ,  =0.3* WWW (ND Sample): N=325,729; M= M max =10 12 =4.51 Protein (S. Cerevisiae): N=1,870; M=4,470M max =10 7 =2.39 Coauthorship (Math): N= 70,975; M= M max = =3.9 Movie Actors: N=212,250; M= M max = =28.78 (Source: Albert, Barabasi, RMP2002)

37  Dense :  tends to a nonzero constant ( N   )  Sparse :  tends to zero ( N   )

38 N(t) E(t) 1.18 N(t) E(t) 1.15 Autonomous Systems Affiliation Network E(t) ~ N(t) a 1

39 Many networks densify over time, but are still sparse

40  Shortest Path, Distance d ij  Diameter D  Average Path Length (Characteristic Path Length) (Average Distance)

41 Method 1: Only consider giant component Disconnected: L Disconnected: L  

42 Method 2: Only consider connected pairs Disconnected: L Disconnected: L  

43 Method 3 : Compute“Harmonic Mean” : Disconnected: L Disconnected: L  

44

45 In most real networks, there are small distances between two randomly selected nodes.

46

47 解决这个问题有两种方法,聪明人的方法和笨人的方法。 聪明人的方法是:照着算法教科书的讲解,实现那个时间复杂 度相当大的名叫嘀嘀哒嘀哒的最短路径算法。 笨人的方法时间复杂度最低:找一堆线头来,按照图的结构连 成一张网,然后一手拿一个顶点,向两边一抻,中间拉直了的 那条路就是最短路径呀。

48 Your clustering coefficient: the probability that two randomly selected friends of U are friends with each other  E i : no. of edges among your k i friends  C i measures network’s local density: the more densely interconnected the neighborhood of node i, the higher is C i.  Network clustering coefficient

49 Many real networks have a much higher clustering coefficient than expected for a completely random network of same no. of nodes and links High-degree nodes tend to have a smaller clustering coefficient than low-degree nodes.

50

51 Radicchi F., et al. Defining and identifying communities in networks. Proc. Natl Acad. Sci. USA 2004;101: The number of triangles to which a given edge belongs, divided by the number of triangles that might potentially include it, given the degrees of the adjacent nodes.

52 S Pajevic, D Plenz. The organization of strong links in complex networks. Nature Physics, 2012, March  the relative neighbourhood overlap  n C is the number of common neighbours  n T is the total number of neighbouring nodes, excluding the end nodes

53 P(k) : The probability that the degree of a randomly selected node is k the fraction of nodes in the network with degree k

54 微博:正态分布的前世今生

55 常见概率分布 P(k) 包括超几何 分布、二项分布和泊松分布 一定条件下都可以看作是近似 正态分布 概率分布都近似具有钟形形状

56

57 Many real networks are scale-free in the sense that the degree distribution deviates significantly from the Poisson distribution

58 身高服从正态分布:平均身高是有 意义的特征 财富服从无标度分布:平均财富并 非有意义特征!  $50 billion After Bill enters the arena the average income of the public ~ 1,000,000

59

60

61 High skew (asymmetry) Straight line on a log-log plot

62

63 Least square regression Some data exhibit a power law only in the tail So need to select a k min (>0) where the power-law starts

64 Least square regression Noise in the tail skews the regression result Will give values of the exponent α that are too low

65 光滑化 相关性

66 but could also be a lognormal or double exponential… Not every network is power law distributed

67

68 A. Clauset, C.R. Shalizi, and M.E.J. Newman, SIAM Review 51(4), (2009) Commonly used methods for analyzing power-law data, such as least-squares fitting, can produce substantially inaccurate estimates of parameters for power-law distributions, and even in cases where such methods return accurate answers they are still unsatisfactory because they give no indication of whether the data obey a power law at all. Here we present a principled statistical framework for discerning and quantifying power-law behavior in empirical data. Our approach combines maximum-likelihood fitting methods with goodness-of-fit tests based on the Kolmogorov–Smirnov (KS) statistic and likelihood ratios. For implementation codes, see:

69 Given p(x), If, for any given constant a, there is a constant g(a) s.t. p(ax) = g(a) p(x) (scale-free), then there are constants C and r s.t. p(x)=C x -r (power-law).

70

71 具有有限均值,但是没有有限方差 大部分具有幂律度分布的实际网络所对应的幂 指数都在这一区间 为概率分布 具有有限均值 具有有限二阶矩

72 WWW: = 7 ±∞ Internet: = 3.5 ±∞ Metabolic: = 7.4 ±∞ Phone call: = 3.16 ±∞ The average values are not meaningful, as fluctuations are too large!

73 The probability to have a node larger than K max

74 Complex Networks & Control Lab, SJTU Xiaofan Wang Shanghai Jiao Tong University


Download ppt "Xiaofan Wang 2014 Network Science: An Introduction."

Similar presentations


Ads by Google