Xiaofan Wang 2014 Network Science: An Introduction.

Xiaofan Wang xfwang@sjtu.edu.cn 2014 Network Science: An Introduction

Network Science Measuring Data Measuring Data Discovering Property Modeling Network Modeling Network Analysis Behavior Analysis Behavior Design Performance

Collect enough information so that you can Describe (correctly) Quantify (properly) Formulate (mathematically ) Predict (reasonably) Control (powerfully)

Communication Transportation Power Grid Social Economical NetworkSpreading Biological

Virus Fasion Behavior Rumor Belief Spreading on Social Networks (video) Spreading on Social Networks (video) Opinion 肥胖自杀

Theoretical Science Graph Theory Game Theory Statistical Physics Computer Science Applied Science Communication Science Power Engineering Life Science Social Science

For every technology, the first ten years is the development, and the second ten years is when the market follows.

Science, in general, is a lot better at breaking complex things into tiny parts than it is at figuring out how tiny parts turn into complex things.

Interdisciplinary nature Data-driven nature Quantitative nature Computational nature

Graph Theory Spectral Graph Theory Markov Chain Theory Fan Chung 金芳蓉 UCSD

A.-L. Barabási Northeastern Mark Newman Michigan Mean-Field Theory Phase Transition Percolation Theory

Jon Kleinberg Cornell

Liu Y Y, Slotine J J, Barabási A L. Nature, 2011, 473(7346): 167-173.

Sinan Aral MIT

Lev Muchnik, Sinan Aral, and Sean J. Taylor, Social Influence Bias: A Randomized Experiment, Science 9 August 2013: 341 (6146), 647-651. 大众点评的餐厅评价豆瓣的电影评价购物网站的商品评价差评师的影响有多大事先为 101281 篇网上文章随机给好评或差评，并与对照组相比事先给好评有引导性：读者给好评的可能性提高了 32% 事先给差评没有引导性：对文章最后的评分几乎没有影响

Unweighted Undirected Weighted Undirected Unweighted Directed Weighted Directed

节点数： 101 边数： 144 节点数： 101 边数： 242

 你要基于电子邮件记录研究交大师生之间的社会关系网络  在生成网络时需要考虑哪些因素？如何确定两个节点之间是否有连边？

No Multi-edge No Self-edge

99.9% 32% Facebook Friendship network: Facebook Love network:

Many networks have a unique giant component

If you have 2 large-components each occupying roughly 1/2 of the graph. How many random edges do you need to add so that the probability that the two components join into one giant component is greater than 0.9? (a) 1-5 edge additions (b) 6-10 edge additions (c) 11-15 edge additions (d) 16-20 edge additions

弱连通巨片 (Giant weakly connected component, GWCC) 具有 Bow-tie structure  强连通核 (SCC)  入部 (IN)  出部 (OUT)  卷须部 (TENDRILS)

 Average degree

Out-Degree In-Degree

对单个节点不成立的性质却对整个系统成立！ Whole net ： out-degree equals to in-degree Single node ： out-degree may not equal to in-degree

冤有头来债有主粉丝数目不靠谱围脖上所有关注的人数 = ？围脖上所有的粉丝数汪小帆老师 349 关注 54785 粉丝

Globally coupled net ： k=N-1, M~O(N 2 ) Practical net ： << N-1, M << O(N 2 )

Sparse real net ： << N-1, M << O(N 2 ) Facebook ： N=7.21 亿, M=687 亿, =190 ，  =0.3* 10 -7 WWW (ND Sample): N=325,729; M=1.4 10 6 M max =10 12 =4.51 Protein (S. Cerevisiae): N=1,870; M=4,470M max =10 7 =2.39 Coauthorship (Math): N= 70,975; M=2 10 5 M max =3 10 10 =3.9 Movie Actors: N=212,250; M=6 10 6 M max =1.8 10 13 =28.78 (Source: Albert, Barabasi, RMP2002)

 Dense ：  tends to a nonzero constant （ N   )  Sparse ：  tends to zero （ N   )

N(t) E(t) 1.18 N(t) E(t) 1.15 Autonomous Systems Affiliation Network E(t) ~ N(t) a 1<a<2

Many networks densify over time, but are still sparse

 Shortest Path, Distance d ij  Diameter D  Average Path Length (Characteristic Path Length) (Average Distance)

Method 1: Only consider giant component Disconnected: L Disconnected: L  

Method 2: Only consider connected pairs Disconnected: L Disconnected: L  

Method 3 ： Compute“Harmonic Mean” ： Disconnected: L Disconnected: L  

In most real networks, there are small distances between two randomly selected nodes.

解决这个问题有两种方法，聪明人的方法和笨人的方法。聪明人的方法是：照着算法教科书的讲解，实现那个时间复杂度相当大的名叫嘀嘀哒嘀哒的最短路径算法。笨人的方法时间复杂度最低：找一堆线头来，按照图的结构连成一张网，然后一手拿一个顶点，向两边一抻，中间拉直了的那条路就是最短路径呀。

Your clustering coefficient: the probability that two randomly selected friends of U are friends with each other  E i : no. of edges among your k i friends  C i measures network’s local density: the more densely interconnected the neighborhood of node i, the higher is C i.  Network clustering coefficient

Many real networks have a much higher clustering coefficient than expected for a completely random network of same no. of nodes and links High-degree nodes tend to have a smaller clustering coefficient than low-degree nodes.

Radicchi F., et al. Defining and identifying communities in networks. Proc. Natl Acad. Sci. USA 2004;101:2658-2663 The number of triangles to which a given edge belongs, divided by the number of triangles that might potentially include it, given the degrees of the adjacent nodes.

S Pajevic, D Plenz. The organization of strong links in complex networks. Nature Physics, 2012, March  the relative neighbourhood overlap  n C is the number of common neighbours  n T is the total number of neighbouring nodes, excluding the end nodes

P(k) : The probability that the degree of a randomly selected node is k the fraction of nodes in the network with degree k

微博：正态分布的前世今生

常见概率分布 P(k) 包括超几何分布、二项分布和泊松分布一定条件下都可以看作是近似正态分布概率分布都近似具有钟形形状

Many real networks are scale-free in the sense that the degree distribution deviates significantly from the Poisson distribution

身高服从正态分布：平均身高是有意义的特征财富服从无标度分布：平均财富并非有意义特征！  $50 billion After Bill enters the arena the average income of the public ~ 1,000,000

High skew (asymmetry) Straight line on a log-log plot

Least square regression Some data exhibit a power law only in the tail So need to select a k min (>0) where the power-law starts

Least square regression Noise in the tail skews the regression result Will give values of the exponent α that are too low

光滑化相关性

but could also be a lognormal or double exponential… Not every network is power law distributed

A. Clauset, C.R. Shalizi, and M.E.J. Newman, SIAM Review 51(4), 661-703 (2009) Commonly used methods for analyzing power-law data, such as least-squares fitting, can produce substantially inaccurate estimates of parameters for power-law distributions, and even in cases where such methods return accurate answers they are still unsatisfactory because they give no indication of whether the data obey a power law at all. Here we present a principled statistical framework for discerning and quantifying power-law behavior in empirical data. Our approach combines maximum-likelihood fitting methods with goodness-of-fit tests based on the Kolmogorov–Smirnov (KS) statistic and likelihood ratios. For implementation codes, see: http://tuvalu.santafe.edu/~aaronc/powerlaws/

Given p(x), If, for any given constant a, there is a constant g(a) s.t. p(ax) = g(a) p(x) (scale-free), then there are constants C and r s.t. p(x)=C x -r (power-law).

具有有限均值，但是没有有限方差大部分具有幂律度分布的实际网络所对应的幂指数都在这一区间为概率分布具有有限均值具有有限二阶矩

WWW: = 7 ±∞ Internet: = 3.5 ±∞ Metabolic: = 7.4 ±∞ Phone call: = 3.16 ±∞ The average values are not meaningful, as fluctuations are too large!

The probability to have a node larger than K max

Complex Networks & Control Lab, SJTU Xiaofan Wang Shanghai Jiao Tong University xfwang@sjtu.edu.cn

Xiaofan Wang 2014 Network Science: An Introduction.

Similar presentations

Presentation on theme: "Xiaofan Wang 2014 Network Science: An Introduction."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Xiaofan Wang 2014 Network Science: An Introduction.

Similar presentations

Presentation on theme: "Xiaofan Wang 2014 Network Science: An Introduction."— Presentation transcript:

Similar presentations

About project

Feedback