Presentation on theme: "The Architecture of Complexity: Structure and Modularity in Cellular Networks Albert-László Barabási University of Notre Dame www.nd.edu/~networks title."— Presentation transcript:
The Architecture of Complexity: Structure and Modularity in Cellular Networks Albert-László Barabási University of Notre Dame title
Erdös-Rényi model (1960) - Democratic - Random Pál Erdös Pál Erdös ( ) Connect with probability p p=1/6 N=10 k ~ 1.5 Poisson distribution
World Wide Web Over 3 billion documents ROBOT: collects all URL’s found in a document and follows them recursively Nodes: WWW documents Links: URL links R. Albert, H. Jeong, A-L Barabasi, Nature, (1999). WWW Expected P(k) ~ k - Found Scale-free Network Exponential Network
Topology of the protein network H. Jeong, S.P. Mason, A.-L. Barabasi, Z.N. Oltvai, Nature 411, (2001) Prot P(k) Nodes : proteins Links : physical interactions (binding)
Scale-free model Barabási & Albert, Science 286, 509 (1999) P(k) ~k -3 BA model (1) Networks continuously expand by the addition of new nodes WWW : addition of new documents GROWTH: add a new node with m links PREFERENTIAL ATTACHMENT: the probability that a node connects to a node with k links is proportional to k. (2) New nodes prefer to link to highly connected nodes. WWW : linking to well known sites
Can Latecomers Make It? Fitness Model SF model: k(t)~t ½ (f irst mover advantage) Real systems: nodes compete for links -- fitness Fitness Model: fitness ( k( ,t)~t where = C G. Bianconi and A.-L. Barabási, Europhyics Letters. 54, 436 (2001).
Bose-Einstein Condensation in Evolving Networks G. Bianconi and A.-L. Barabási, Physical Review Letters 2001; cond-mat/ NetworkBose gas Fit-gets-richBose-Einstein condensation
Robustness Complex systems maintain their basic functions even under errors and failures (cell mutations; Internet router breakdowns) node failure fcfc 01 Fraction of removed nodes, f 1 S Robustness
Robustness of scale-free networks 1 S 0 1 f fcfc Attacks 3 : f c =1 (R. Cohen et al PRL, 2000) Failures Robust-SF Albert, Jeong, Barabasi, Nature (2000)
Real networks are fragmented into group or modules Society: Granovetter, M. S. (1973) ; Girvan, M., & Newman, M.E.J. (2001); Watts, D. J., Dodds, P. S., & Newman, M. E. J. (2002). WWW: Flake, G. W., Lawrence, S., & Giles. C. L. (2000). Biology: Hartwell, L.-H., Hopfield, J. J., Leibler, S., & Murray, A. W. (1999). Internet: Vasquez, Pastor-Satorras, Vespignani(2001). Modularity Traditional view of modularity: Ravasz, Somera, Mongru, Oltvai, A-L. B, Science 297, 1551 (2002).
Modular vs. Scale-free Topology Scale-free (a) Modular (b)
Hierarchical Networks T number of links between the neighbors k number of neighbors (connectivity) Clustering coefficient the measure of a vertex neighborhood interconnectivity.
Real Networks HollywoodLanguage Internet (AS) Vaquez et al,'01 WWW Eckmann & Moses, ‘02
Hierarchy in biological systems Metabolic networks Protein networks
Population density Router density Spatial Distributions Yook, Jeong and A.-L.B, PNAS 2002
Spatial Distribution of Routers Fractal set Box counting: N( ) No. of boxes of size that contain routers N( ) ~ -D f D f =1.5
Preferential Attachment Compare maps taken at different times ( t = 6 months) Measure k ( k ), increase in No. of links for a node with k links Preferential Attachment: k ( k ) ~ k
Modeling the effect of spatial distribution Model place nodes on a two dimensional space, forming a fractal D f Preferential attachment: Question: Could the distance dependence kill the power law? Numerical results: when σ<2 then P(k) is power law when σ>2 then P(k) is exponential For the Internet: σ=1 (!) thus the Internet could be scale-free.
Phase Diagram N ( ) ~ -D f D f =1.5 k(k) ~ k =1 P(d) ~ d - =1
Subgraphs Subgraph: a connected graph consisting of a subset of the nodes and links of a network Subgraph properties: n: number of nodes m: number of links (n=3,m=3) (n=3,m=2) (n=4,m=4) (n=4,m=5).
Subgraph density in biological networks (n,m)TranscriptionMetabolicProtein E. coliS. cerevisiaeE. coliS. cerevisiae (3,2) (3,3) (4,3) (4,6) (5,4) x x x10 5 (5,10) (6,5)3.2x x x x x10 6 (6,15) (7,6)3.4x x x x x10 8 (7,21)0.00 # of subgraphs/# of nodes
Motifs R. Milo et al., Science 298, 824 (2002) Motifs: Subgraphs that have a significantly higher density in the real network than in the randomized version of the studied network Randomized networks: Ensemble of maximally random networks preserving the degree distribution of the original network
R Milo et al., Science 298, (2002).
R. Milo et al., Science 298, 824 (2002) Hypothesis: desirable dynamical and signal processing features. Evolutionary selection of dynamically desirable “building blocks”. Feed-Forward (FF) Motif is a noise filter. Why do we have motifs?
Do motifs have biological relevance? -high degree of evolutionary conservation of the motifs within the protein interaction network (Wuchty et al., Nat. Genet. 2003) -convergent evolution in the transcriptional network of diverse species toward the same motif types (Canant and Wagner, Nat. Genet. 2003)
What determines the number of subgraphs in biological networks? (n,m)TranscriptionMetabolicProtein E. coliS. cerevisiaeE. coliS. cerevisiae (3,2) (3,3) (4,3) (4,6) (5,4) x x x10 5 (5,10) (6,5)3.2x x x x x10 6 (6,15) (7,6)3.4x x x x x10 8 (7,21)0.00
Global network properties A.-L. B. and Z.N. Oltvai, Nat. Rev. Gen.(2004)
Average number of subgraphs passing by a node with degree k Average number of (n,m) subgraphs in the network Number of subgraphs with n nodes Probability of having m-(n-1) extra links Calculating the number of subgraphs: I
P(k)~k -,C(k)~k -,n<
Phase boundary:Type I: Type II:
How do the subgraphs relate to each other? S. Cerevisiae transcription-regulatory network Subgraphs do not exist in isolation: aggregate into subgraph clusters.
From global to local T(k) number of direct links between the node’s k neighbors OR T(k) the number of triangles a node with k links participates in k=5 T(k) =2 (in the limit of large k) P(T) probability that T triangles pass by a node
Generalization to arbitrary subgraphs n=3 t=1 m: number of links n: number of nodes t=m-n+1
Motif clusters: S. cerevisiae
Motif clusters The probability that a link connected to a node of degree k participates in a triangle is given by q(k) = 1 - [1 - C(k)] k-1, which is the probability that the neighbor at the other end of the link is connected to at least one of the k - 1 remaining neighbors. If C(k) = C 0 k -1, for large k we obtain q ~ 1 - exp(- C 0 ), independent of the node degree. calculating the size of the largest triangle cluster is equivalent with determining the largest cluster of connected nodes, after each link is removed with probability 1-q.
The emergence of subgraph clusters Lower bound percolation boundary: Phase boundary (Type I and II):
S 6,m Fraction of nodes in the largest cluster of (6,m) subgraphs
Conclusions A sharp distinction between the local and global structure is not justified (they are equivalent): From only two parameters, characterizing the network’s global topology, we are able to derive the precise subgraph densities seen in biological networks Measuring the distribution of two motifs allows us to determine the exponents characterizing the network’s large scale features. Subgraphs do not exist in isolation must aggregate into subgraphs clusters
Zoltán N. Oltvai, Northwestern Med. School Hawoong Jeong, KAIST, Corea Réka Albert, Penn State Erzsébet Ravasz, Notre Dame S.H. Yook, Notre Dame Eivind Almaas, Notre Dame Baldvin Kovács, Eotvos Univ. Budapest Tamás Vicsek, Eotvos Univ. Budapest A. Vázquez, Notre Dame R. Dobrin, Northwestern D. Sergi, U. de Genève J.-P. Eckmann, U. de Genève Stefan Wuchty, Notre Dame