Statistical physics of complex networks

Statistical physics of complex networks
Sergei Maslov Brookhaven National Laboratory

Short history: complex systems before & after networks
Statistical physics of complex systems was active in 80’s-90’s (following the chaos boom of 70’s) Fractals (Mandelbrot and many others) Self-Organized Criticality (Per Bak and co-authors)  sandpiles  granular systems Complex==multiple time and length scales (e.g. avalanches)  Cult of power-laws Cellular automata (mostly in real space+time) Examples: earthquakes disordered moving interfaces (co)-evolution of species agent-based modeling (“ants”) By the end of 90’s breakup of the community and specialization Biology Economics and finance Internet Social sciences

Networks in complex systems
Large number of components interacting with each other All components and/or interactions are different from each other (unlike in traditional physics where 1023 electrons are all the same!) Paradigms: 104 types of proteins in an organism, 106 routers in the Internet 109 web pages in the WWW 1011 neurons in a human brain The simplest property: who interacts with whom? can be visualized as a network Complex networks are just a backbone for complex dynamical processes

Why study the topology of complex networks?
Lots of easily available data: that’s where the state of the art information is (at least in biology) Large networks may contain information about basic design principles and/or evolutionary history of the complex system This is similar to paleontology: learning about an animal from its backbone

Inside single cells

A small part of a metabolic network: the citric acid cycle

Metabolic pathway chart by ExPASy

Protein binding networks
Baker’s yeast S. cerevisiae (only nuclear proteins shown) Nematode worm C. elegans

Transcription regulatory networks
Single-celled eukaryote: S. cerevisiae Bacterium: E. coli

protein-gene interactions
GENOME protein-gene interactions PROTEOME protein-protein interactions METABOLISM bio-chemical reactions slide after Reka Albert

Between cells in a multi-cellular organism

Sea urchin embryonic development (endomesoderm up to 30 hours) by Davidson’s lab

C. elegans neurons

Between organisms

Freshwater food web by Neo Martinez and Richard Williams

Sexual contacts: M. E. J. Newman, The structure and function of complex networks, SIAM Review 45, (2003).

Social

High school dating: Data drawn from Peter S
High school dating: Data drawn from Peter S. Bearman, James Moody, and Katherine Stovel visualized by Mark Newman

Network of actor co-starring in movies

Networks of scientists’ co-authorship of papers

Webpages connected by hyperlinks on the AT&T website circa 1996 visualized by Mark Newman
Citation networks are similar to the WWW but time-ordered

Technological

Internet as measured by Hal Burch and Bill Cheswick's Internet Mapping Project.

transportation networks: airlines

transportation networks: railway maps
Tokyo rail map

Lecture 1: General introduction into networks
Node degrees, its distribution, and correlations Simple models preferential attachment and Simon model Growth model for protein families Percolation transition on networks Clustering coefficient Lectures 2-3: Biomolecular (mostly protein) networks Regulatory and signaling networks How many regulators? Bureaucratic collapse Network motifs in directed (e.g. regulatory) networks Protein binding networks Broad degree distributions in protein binding networks and possible explanations Evolutionary (duplication-divergence) Biophysical (stickiness) Functional Beyond degree distributions: How it all is wired together? Correlations in degrees Randomization of networks Law of Mass Action and propagation of perturbations Lecture 4: Technological and information networks Diffusion and modules in the Internet, WWW, and scientific citations Predicting opinions of customers on products (e.g. movies) using knowledge networks

Degree (or connectivity) of a node – the # of neighbors
K=2 Degree K=4

Directed networks have in- and out-degrees
In-degree Kin=2 Out-degree Kout=5

Degree distributions in random and real networks

Degree distribution in a random network
Poisson distribution Randomly throw E edges among N nodes Solomonoff, Rapaport, Bull. Math. Biophysics (1951) Erdos-Renyi (1960) Degree distribution – Binominal  Poisson K~ with no hubs (fast decay of N(K))

Degree distribution in real protein binding network
Histogram N(K) is broad: most nodes have low degree ~ 1, few nodes – high degree ~100 Can be approximately fitted with N(K)~K- functional form with ~=2.5

Many real world networks have broad degree distributions
exponent  film actors 2.3 telephone call graph 2.1 networks 1.5/2.0 sexual contacts 3.2 WWW 2.3/2.7 internet 2.5 peer-to-peer metabolic network 2.2 protein interactions 2.4

Basic BA-model Very simple algorithm to implement 1 2 3
start with an initial set of m0 fully connected nodes e.g. m0 = 3 now add new vertices one by one, each one with exactly m edges each new edge connects to an existing vertex in proportion to the number of edges that vertex already has → preferential attachment easiest if you keep track of edge endpoints in one large array and select an element from this array at random the probability of selecting any one vertex will be proportional to the number of times it appears in the array – which corresponds to its degree 1 2 3 ….

generating BA graphs – cont’d
1 2 3 To start, each vertex has an equal number of edges (2) the probability of choosing any vertex is 1/3 We add a new vertex, and it will have m edges, here take m=2 draw 2 random elements from the array – suppose they are 2 and 3 Now the probabilities of selecting 1,2,3,or 4 are 1/5, 3/10, 3/10, 1/5 Add a new vertex, draw a vertex for it to connect from the array etc. 1 2 3 4 1 2 3 4 5

The tale of linear vs exponential growth
Linear growth: Barabasi-Albert model with =3 is a version of the Simon’s word usage model: =2+ dnk/dt=(k-1)nk-1/(t+t)-knk/(t+t) Exponential growth: Protein duplication-deletion model: =2+/(dup-del) dnk/dt=dup (k-1)nk-1- (dup+del )knk+ +del (k+1)nk+1; NF=knk also grows exponentially: dNF/dt=  NG=  kknk

Preferential attachment with fitness
Bianconi-Barabasi (2001) Attractiveness of a node to new edges is given by fiki/rfrkr For uniform (f): Pk ~ k-(1+C*)/ln(k), where C*=1.255 Generally C depends on (f) Some (f) result in “Bose-Einstein condensation” in which super-hubs emerge

Percolation transition in networks

Why should we care? The most important property of a network. It quantifies how broken-up is a network Below the percolation threshold: many small components At the percolation threshold: scale-free distribution of component sizes: P(S)=S-2.5 Above the percolation threshold: giant connected component and a few small ones? Determines the propagation of perturbations which affect neighbors with probability p (e.g. infections)

Naïve (and wrong) argument
An average node has <K> first neighbors, <K><K-1> second neighbors, <K><K-1><K-1> third neighbors We neglect overlap between e.g. second and first neighbors: in random networks a small effect ~1/N If <K-1>  1 a single node is connected to a finite fraction of all nodes in the network

Where is it wrong? Probability to arrive at a node with K neighbors is proportional to K! All averages have to be modified <F(K)>  <F(K) K>/<K> The right answer: <K(K-1)>/<K>  1 a perturbation would spread In directed networks it is <KinKout>/<Kin>  1 Correlations between degrees of neighbors and an abnormally large number of triangles (clustering) would affect the answer

How many clusters? If <K(K-1)>/<K> << 1 there are only small clusters If <K(K-1)>/<K>  1 cluster sizes S have a scale-free distribution: P(S)~S-2.5. If <K(K-1)>/<K> >> 1 there is one “giant” cluster and a few small ones Perturbation which affects neighbors with probability p propagates if p<K(K-1)>/<K>  1 For scale-free networks P(K)~K- with <3, <K2>=  perturbation always spreads in a large enough network

Diameter and mean cluster size are determined by <k(k-1)>/<k>
Mean diameter L: 1+<k>+ <k><k(k-1)>/<k>+ <k>(<k(k-1)>/<k>)L= =N  L  log(N/<k>)/log(<k(k-1)>/<k>)+1 Mean cluster size below pc: <S>=1+<k>/(1-<k(k-1)>/<k>)

Amplification ratios A(dir): 1.08 - E. Coli, 0.58 - Yeast
A(undir): E. Coli, 13.4 – Yeast A(PPI): ? - E. Coli, Yeast

Clustering coefficient C
C=3 N/knk k(k-1)/2 Could be defined for individual nodes or as a function of k: C(k)=3 N(k)/nk k(k-1)/2 C=1 could not be realized if k is heterogeneous Needs to be compared to its value in randomized networks with the same degree sequence

End lecture 1

Lecture 2

Protein networks

Places to learn molecular biology
Molecular Biology of the Cell. Fourth Edition. Bruce Alberts, Alexander Johnson, Julian Lewis, Martin Raff, Keith Roberts, Peter Walter. Garland Science DNA from the beginning. Online Biology Book. Kimball’s Biology Pages. Gene expression. Human Genome Project. Microarrays. From Prof. Michael Hallett (McGill) online lectures

Protein networks Nodes – proteins
Edges – interactions between proteins Metabolic (protein enzymes on sharing common metabolites are connected) Physical (binding interactions) Regulatory and signaling (transcriptional regulation, protein modifications) Co-expression networks from microarray data (connect genes with similar expression (abundance) patterns under many conditions) Genetic interactions e.g. synthetic lethal protein pairs (removal of any one of the two proteins doesn’t kill the cell, but removal of both proteins does) Etc, etc, etc.

Sources of data on protein networks
Genome-wide experiments Binding – two-hybrid (Y2H) and mass-spec (MS) high-throughput techniques Transcriptional regulation – ChIP-on-chip, or ChIP-then-SAGE Expression, disruption networks – microarrays Lethality of genes (including synthetic lethals): Gene knockout – yeast RNAi –worm, fly Many small or intermediate-scale experiments All stored in public databases: BIOGRID, DIP, BIND, YPD (no longer public), SGD, Flybase, Ecocyc, etc.

Pathway  network paradigm shift

Inhibition of apoptosis MAPK signaling
Images from ResNet3.0 by Ariadne Genomics Inhibition of apoptosis MAPK signaling

Transcription factors bind DNA

Activators and repressors
Depending on the position of the binding site (operator) with respect to the RNA-polymerase binding site (promoter) Transcription Factors could either activate or repress the production of mRNA from a given gene (transcription) and thus affect the abundance of a protein product

Single-celled eukaryote: S. cerevisiae; 3:1 ratio Bacterium: E. coli 3:2 ratio

Sea urchin embryonic development (endomesoderm up to 30 hours) by Davidson’s lab

How many transcriptional regulators are out there?

Fraction of transcriptional regulators in bacteria
from Stover et al., Nature (2000)

Figure from Erik van Nimwegen, TIG 2003

Complexity of regulation grows with complexity of organism
NR<Kout>=N<Kin>=number of edges NR/N= <Kin>/<Kout> increases with N <Kin> grows with N In bacteria NR~N2 (Stover, et al. 2000) In eucaryots NR~N1.3 (van Nimwengen, 2002) Networks in more complex organisms are more interconnected then in simpler ones

Complexity is manifested in Kin distribution
E. coli vs H. sapiens

Table from Erik van Nimwegen, TIG 2003

Toolbox model NTF=AN2  dNTF=2ANdN  dN/dNTF=2A/N
In small genomes ~100 genes per TF. In large ones only 4! A toolbox (e.g. metabolic network) grows linearly with N. To handle a new condition (NTFNTF+1) one needs fewer and fewer new tools. S. Maslov, S. Krishna, K. Sneppen in preparation

How is it all connected? (beyond degree distribution)

What is unusual about topology of a given network?
Look for a number of occurrences of a certain topological pattern Compare with a randomized network What patterns to look for? Number of edges connecting nodes with given degrees (degree-degree correlations) Motifs – small subgraphs of 3-4 nodes (in undirected networks clustering or the triangles) Overrepresentation – Nature needs them for some function Underrepresentation – they are detrimental and nature avoids them

How to construct a proper random network?

Randomization of a network
given complex network random

Stub reconnection algorithm
Break every edge into two halves (“stubs”) Randomly reconnect stubs Watch for multiple edges! For example, in the AS-Internet two largest hubs would end up being connected with 50 edges (sic!) Not adaptable to conserve other low-level topological properties of the network

Local rewiring algorithm
R. Kannan, P. Tetali, and S. Vempala, Random Structures and Algorithms (1999) SM, K. Sneppen, Science (2002) Randomly select and rewire two edges Repeat many times

Metropolis rewiring algorithm
“energy” E “energy” E+E SM, K. Sneppen: cond-mat preprint (2002), Physica A (2004) Randomly select two edges Calculate change E in “energy function” E=(Nactual-Ndesired)2/Ndesired Rewire with probability p=exp(-E/T)

Degree-degree correlations

Central vs peripheral network architecture
(anti-hierarchical) central (hierarchical) random A. Trusina, P. Minnhagen, SM, K. Sneppen, Phys. Rev. Lett. 92, 17870, (2004)

What is the case for protein interaction network
SM, K. Sneppen, Science 296, 910 (2002)

Correlation profile Count N(k0,k1) – the number of links between nodes with connectivities k0 and k1 Compare it to Nr(k0,k1) – the same property in a random network Qualitative features are very noise-tolerant with respect to both false positives and false negatives

Correlation profile of the protein interaction network
R(k0,k1)=N(k0,k1)/Nr(k0,k1) Z(k0,k1) =(N(k0,k1)-Nr(k0,k1))/Nr(k0,k1) Similar profile is seen in the yeast regulatory network

Hubs may act within a module, or connect modules
Party hub: simultaneous interactions tends to be within the same module Date hub: sequential interactions connect different modules Han et al, Nature 443, 88 (2004)

Correlation profile of the yeast regulatory network
R(kout, kin)=N(kout, kin)/Nr(kout,kin) Z(kout,kin)=(N(kout,kin)-Nr(kout,kin))/ Nr(kout,kin)

Some scale-free networks may appear similar
In both networks the degree distribution is scale-free P(k)~ k- with ~

But: correlation profiles give them unique identities
Protein interactions Internet

Small network motifs (Uri Alon and his group)

All 3 node motifs

Motifs can overlap in the network
motif to be found graph motif matches in the target graph

Detection of important network motifs
Technique: construct many random graphs with the same number of nodes and degree distribution count the number of motifs in those graphs calculate the Z score: the probability that the same or larger number of motifs in the real world network could have occurred in a random one Software available:

What the Z score means x - mx zx sx =
m = mean number of times the motif appeared in the random graph the probability observing a Z score of 2 is In the context of motifs: Z > 0, motif occurs more often than for random graphs Z < 0, motif occurs less often than in random graphs |Z| > 1.65, only a 5% chance of random occurrence s standard deviation # of times motif appeared in random graph x - mx zx = sx

Examples of network motifs (3 nodes)
X Y Z Feed forward loop Found in many transcriptional regulatory networks X Y Z coherent incoherent

Possible functional role of a coherent feed-forward loop
Noise filtering: short pulses in input do not result in turning on of the Z To function needs time-delay (about 0.5hrs for bacterial transcription)

All 4 node subgraphs (computational expense increases with the size of the graph!)

Higher-order motifs 4-node motifs contain some 3-node motifs
One needs to be careful when calculating over-representation Alon & co-authors use our Metropolis algorithm to generate networks with a given number of low-level motifs

Table 1 from R Milo, S Shen-Orr, S Itzkovitz, N Kashtan, D Chklovskii & U Alon, Network Motifs: Simple Building Blocks of Complex Networks Science, 298: (2002)

Examples of network motifs (4 nodes)
Y Z Parallel paths are over represented Neural networks Food webs

Finding classes on graphs based on their motif “profiles”

THE END

Statistical physics of complex networks

Similar presentations

Presentation on theme: "Statistical physics of complex networks"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Statistical physics of complex networks

Similar presentations

Presentation on theme: "Statistical physics of complex networks"— Presentation transcript:

Similar presentations

About project

Feedback