Presentation is loading. Please wait.

Presentation is loading. Please wait.

Complex networks in nature PHYSBIO 2007 Imre Derényi Dept. of Biological Physics, Eötvös University, Budapest Complex systems are often made of many non-identical.

Similar presentations


Presentation on theme: "Complex networks in nature PHYSBIO 2007 Imre Derényi Dept. of Biological Physics, Eötvös University, Budapest Complex systems are often made of many non-identical."— Presentation transcript:

1 Complex networks in nature PHYSBIO 2007 Imre Derényi Dept. of Biological Physics, Eötvös University, Budapest Complex systems are often made of many non-identical elements connected by diverse interactions. networks graphs

2 Outline  Lectures 1-3: Graph theoretical basics, examples of real networks, basic models (Erdős-Rényi, small world, scale free graphs) and their properties, examples.  Lecture 4: Dynamics on networks: error and attack tolerance, disease spreading, metabolic networks.  Lecture 5: Network motifs and communities.

3 Graph theory basics A graph, usually denoted as G(V,E), consists of a set of vertices (or nodes) V together with a set of edges (or links) E. Every edge connects its two endvertices. The order of a graph (denoted by N) is the number of its vertices. A graph is a simple graph if it has no multiple edges or loops. If not stated otherwise, a graph is usually assumed to be simple.

4 Two vertices are adjacent (or neighbors of each other) if there is an edge connecting them. Every graph can be represented by its adjacency matrix A, which is an N  N symmetric binary matrix with elements A ij = A ji = 1 if vertex i is adjacent to vertex j and A ij = A ji = 0 otherwise. The degree k i of vertex i is the number of its neighbors (or edges): The sum of the degrees of all the vertices is twice the number M of the edges of the graph:

5 A sequence of adjacent vertices is a walk. A walk is closed if its first and last vertices are the same, and open if they are different. A walk in which no edge occurs more than once is known as a trail. A closed trail is called tour or circuit. A walk in which no vertex occurs more than once is known as a path. A cycle can be defined as a closed path. Two vertices are reachable from each other, if there exists a path between them. A graph is connected, if any of its vertices can be reached from any other. A path or cycle is Hamiltonian if it uses all vertices exactly once. A trail or circuit is Eulerian if it uses all edges precisely once.

6 A component of a graph is defined as a maximal connected subgraph. A subgraph of a graph G is a graph whose vertices and edges are subsets of those of G. A subgraph of G is a spanning subgraph, or factor, if it contains all the vertices of G. k-cliques are complete subgraphs of order (size) k. Cliques are maximal complete subgraphs. A tree is an acyclic connected graph. It has N-1 edges.

7 The distance d(i, j) between two (not necessary distinct) vertices i and j is the length of a shortest path between them. The length l of a walk is the number of edges that it uses. The eccentricity ε(i) of a vertex i is its maximum distance from any other vertex: The diameter D of a graph is its maximum eccentricity: The characteristic path length (sometimes also called diameter) is defined as: The radius R of a graph is its minimum eccentricity:

8 Extensions If weight or cost is assigned to each edge, then we get a weighted graph. In the calculation of lengths the weights are taken into account. In a hypergraph more than two vertices can be connected by hyperedges. If the edges are directed, then we have a directed graph or digraph. In-neighbors and out-neighbors, and in-degrees and out-degrees can be distinguished.

9 Random graphs Graph theory was invented by Euler in the 18 th century. The early work was concentrated on small graphs with a high degree of regularity Random-graph theory was introduced by Erdős and Rényi in the late 1950s. As complex networks often appear to be random, random- graph theory appears to be a useful tool in the study of large complex networks.

10 The Erdős-Rényi model Pál Erdős Pál Erdős (1913-1996)  Original model: Connect N nodes by M edges randomly.  Alternative model: Connect every pair of the N nodes with probability p. The two models (or ensembles) become equivalent in the thermodynamic limit p=1/6 The average degree of a node is 20:52

11 The Erdős-Rényi model Degree distribution: The characteristic path length can be estimated from Poisson distribution resulting in

12 The greatest discovery of Erdős and Rényi was that many network properties appear suddenly as p is increased. As an example let us consider the occurrence of an arbitrary subgraph consisting of n vertices and m edges. Their number can be estimated as: Thus the critical probability of appearance is:

13 A giant (percolating) component also appears suddenly. This can easily be understood with the help of a branching process: 1.Let us start to grow a component from a seed vertex by randomly selecting its neighbors from the remaining N-1 vertices with probability p. 2.Let us repeat this process with the newly selected vertices as seeds, over and over again. 3.The branching process stops when no new neighbor is selected. If p < p c = 1/N then the expected number of new neighbors is smaller than the number of seeds, and the branching process quickly comes to a halt. If, on the other hand, p > p c = 1/N then the component can easily grow to infinity. The giant component has a tree-like structure.

14 Are complex networks really random? No! One big difference is that nodes are often clustered, i.e., neighbors of a node tend to be connected to each other. Clustering coefficient: Small worlds: Networks are clustered, [C >> C rand = p] but have a small characteristic path length L. Probability that the neighbors are connected

15 Watts-Strogatz model [Watts and Strogatz, Nature 393, 440 (1998)]

16 Watts-Strogatz model n nodes per block: if Optimal n:

17 World Wide Web 800 million documents (S. Lawrence, 1999) ROBOT: collects all URL’s found in a document and follows them recursively Nodes: WWW documents Links: URL links R. Albert, H. Jeong, A-L Barabasi, Nature, 401 130 (1999) WWW

18 P(k=500) ~ 10 -99  N(k=500)~10 -90 What can we expect for ER and WS networks? The results: Scale-free network  out = 2.45  in = 2.1 WWW-power P(k=500) ~ 10 -6  N(k=500)~10 3  k  ~ 6 N WWW ~ 10 9

19 INTERNET BACKBONE (Faloutsos, Faloutsos and Faloutsos, 1999) Nodes: computers, routers Links: physical lines Internet

20 ACTOR CONNECTIVITIES Nodes: actors Links: cast jointly N = 212,250 actors  k  = 28.78 P(k) ~k -   =2.3

21 SCIENCE CITATION INDEX (  = 3) Nodes: papers Links: citations (S. Redner, 1998) P(k) ~k -  1736 PRL papers (1988)

22 Coauthorship Nodes: scientist (authors) Links: joint publication (Newman, 2000, Barabasi et al 2001) SCIENCE COAUTHORSHIP M: math NS: neuroscience

23 Coauthorship Nodes: online user Links: email contact Ebel, Mielsch, Bornholdt, PRE 2002. Online communities Kiel University log files 112 days, N=59,912 nodes

24 Food Web Nodes: trophic species Links: trophic interactions R.J. Williams, N.D. Martinez, Nature (2000) R. Sole (cond-mat/0011195)

25 Sex-web Nodes: people (Females; Males) Links: sexual relationships Liljeros et al. Nature 2001 4781 Swedes; 18-74; 59% response rate.

26 Most real world networks have the same internal structure: Scale-free networks Why? What does it mean?

27 SCALE-FREE NETWORKS (1) The number of nodes (N ) is NOT fixed. Networks continuously expand by the addition of new nodes Examples: WWW : addition of new documents Citation : publication of new papers (2) The attachment is NOT uniform. A node is linked with higher probability to a node that already has a large number of links. Examples : WWW : new documents link to well known sites (CNN, YAHOO, NewYork Times, etc) Citation : well cited papers are more likely to be cited again Origins SF

28 Scale-free model (1) GROWTH : A t every timestep we add a new node with m edges (connected to the nodes already present in the system). (2) PREFERENTIAL ATTACHMENT : The probability Π that a new node will be connected to node i depends on the degree k i of that node A.-L. Barabási, R. Albert, Science 286, 509 (1999) P(k) ~k -3 BA model

29 Mean Field Theory, with the initial condition: A.-L.Barabási, R. Albert and H. Jeong, Physica A 272, 173 (1999) MFT

30 Growth without preferential attachment

31 Preferential Attachment Citation network Internet For given  t,  k   (k) (Jeong, Neda, A.-L. B, cond-mat/0104131)

32  exponent is not universal Extended Model prob. p : internal links prob. q : link deletion prob. 1-p-q : add node

33 More models Other Models

34 Presence of a giant (percolating) component Branching process: The probability that an edge leads to a vertex with degree k is: The condition that the branching process prevails:

35 Yeast protein network Nodes : proteins Links : physical interactions (binding) P. Uetz, et al. Nature 403, 623-7 (2000). Prot Interaction map

36 C. Elegans Li et al. Science 2004 Drosophila M. Giot et al. Science 2003

37 Origin of the scale-free topology of PPI networks: gene duplication Proteins with more interactions are more likely to obtain new links: Π(k) ~ k (preferential attachment) Wagner 2001; Vazquez et al. 2003; Sole et al. 2001; Rzhetsky & Gomez 2001; Qian et al. 2001; Bhan et al. 2002.

38 Metabolic network The metabolic networks of organisms from all three domains of life are scale-free! H. Jeong, B. Tombor, R. Albert, Z.N. Oltvai, and A.L. Barabasi, Nature, 407 651 (2000) ArchaeaBacteriaEukaryotes Nodes: chemicals (substrates) Links: bio-chemical reactions

39 Characterizing the links Metabolism: Flux Balance Analysis (Palsson) Metabolic flux for each reaction Edwards, J. S. & Palsson, B. O, PNAS 97, 5528 (2000). Edwards, J. S., Ibarra, R. U. & Palsson, B. O. Nat Biotechnol 19, 125 (2001). Ibarra, R. U., Edwards, J. S. & Palsson, B. O. Nature 420, 186 (2002). stoichiometric mx.flux vector Maximize cv, where c is the unit vector in the direction of growth (biomass production).

40 Global flux organization in the E. coli metabolic network E. Almaas, B. Kovács, T. Vicsek, Z. N. Oltvai, A.-L. B. Nature, 2004; Goh et al, PRL 2002. SUCC: Succinate uptake GLU : Glutamate uptake Central Metabolism, Emmerling et. al, J Bacteriol 184, 152 (2002)

41 Inhomogeneity in the local flux distribution ~ k -0.27 Mass flows along linear pathways

42 Robustness Complex systems maintain their basic functions even under errors and failures (cell  mutations; Internet  router breakdowns) node failure Robustness

43 Robustness of scale-free networks 1 S 0 1 f fcfc AttacksFailures Robust-SF Albert, Jeong, Barabasi, Nature 406 378 (2000)

44 Cohen, Erez, ben-Avraham, Havlin, PRL 85, 4626 (2000) After random removal of a fraction f of the vertices: The new degree distribution: Percolation: Critical fraction: Absence of a critical percolation threshold for γ ≤ 3

45 Achilles’ Heel of complex networks Internet failure attack Achilles Heel R. Albert, H. Jeong, A.L. Barabasi, Nature 406 378 (2000)

46 Yeast protein network - lethality and topological position - Highly connected proteins are more essential (lethal)... Prot- robustness H. Jeong, S.P. Mason, A.-L. Barabasi, Z.N. Oltvai, Nature 411, 41-42 (2001)

47 Disease spreading in the susceptible-infected-susceptible (SIS) epidemic model Rate of becoming infected by an infected neighbor: Rate of recovery:  Mean-field approx. for “exponential” networks, where : Steady state solution: Epidemic threshold: Pastor-Satorras and Vespignani, PRE 65, 036104 (2002)

48 SIS in complex networks Mean-field approximation: Steady state solution: The probability that an edge leads to a vertex with degree k is: The probability that a neighbor is infected:

49 SIS in complex networks Uniform immunization with probability g does not help in scale free networks if γ ≤ 3. This has a nontrivial solution when: from which we get that the epidemic threshold is:

50 Non-uniform immunization of complex networks Thus, the epidemic threshold is reintroduced: Ifi.e. whenthen

51 Motifs Motifs: Subgraphs that have a significantly higher density in the real network than in the randomized version of the studied network Randomized networks: Ensemble of maximally random networks preserving the degree distribution of the original network Function is often carried out by subnetworks, rather than by single components. R. Milo et al., Science 298, 824-827 (2002)

52 Three-node connected subgraphs

53 Network motifs

54 Hypothesis: they are dynamically desirable “building blocks”. Feed-Forward (FF) motive is a noise filter. Why do we have motifs?

55 Communities: “densely connected subgraphs”

56 Traditional method: hierarchical clustering (agglomerative method) All edges are removed, and then added back one by one in decreasing order of their “strengths”. Communities are defined as the forming components. dendogram: The strength of the relationship between any pair of vertices can, e.g., be defined as where The matrix A l contains the number of walks with length l between the vertex pairs.

57 Girvan-Newman method (divisive method) It also results in a dendogram, by cutting the edges one by one. In each step the edge with the highest “betweenness centrality” (BC) is removed. The BC of an edge is the number of shortest paths between all pairs of vertices that use this edge. Girvan and Newman, PNAS 99, 7821 (2002)

58 Modularity When should one stop with the agglomeration/division? Newman and Girvan, PRE 69, 026113 (2004) At the maximal modularity: (fraction of edge ends being in group g) Q is the fraction of edges in the groups compared to that in the randomized network.

59 Potts model Minimization of the Hamiltonian: Reichardt and Bornholdt, PRL 93, 218701 (2004)

60 Clique percolation method (CPM) Most real networks are characterized by overlapping and nested communities. Divisive/agglomerative methods fail to identify the communities when overlaps are significant. Derényi, Palla, and Vicsek, Phys. Rev. Lett. 94, 160202 (2005) Palla, Derényi, Farkas, and Vicsek, Nature 435, 814-818 (2005)

61 Advantages of this method: local, allows overlaps, density (not distance) based, produces no cut-nodes, … An example of overlapping k-clique communities for k=4: k-cliques are complete subgraphs of size k: k = 2k = 3k = 4k = 5 We define a community as a k-clique percolation cluster.

62 Studied systems: Co-authorship network Los Alamos cond-mat archive 30,739 nodes and 136,065 links Word association network South Florida Free Association norms list 10,617 nodes and 63,788 links Protein-protein interaction network DIP core list of the yeast S. cerevisiae 2,609 nodes and 6,355 links Links are usually weighted (w ij ). For each value of k (typically k=3,4,5) a threshold weight can be introduced. (Note that there is a critical threshold at which a giant cluster appears. Optimally the threshold weight should be chosen close to this critical value.)

63

64 Web of communities for the protein interaction network of yeast links represent overlaps between the communities

65 Community statistics community size distribution community degree distribution overlap size distr.membership number distr.

66 Clique percolation in an ER graph Branching process:

67 http://www.cfinder.org/ Dedicated web page for the CPM (software, papers, data): Some review papers: Albert and Barabasi, Rev. Mod. Phys. 74, 47 (2002). Dorogovtsev and Mendes, Adv. Phys. 51, 1079 (2002). Useful web page with papers, data, and ppt presentations: http://www.nd.edu/~networks/ (Where many of the slides of this course have been “borrowed” from.)


Download ppt "Complex networks in nature PHYSBIO 2007 Imre Derényi Dept. of Biological Physics, Eötvös University, Budapest Complex systems are often made of many non-identical."

Similar presentations


Ads by Google