Complex networks in nature PHYSBIO 2007 Imre Derényi Dept. of Biological Physics, Eötvös University, Budapest Complex systems are often made of many non-identical.

Slides:



Advertisements
Similar presentations
Course Evaluation Form About The Course -Go more slowly (||) -More lectures (||) -Problem Sets, Class Projects (|||) -Software tools About The Instructor.
Advertisements

Complex Networks Advanced Computer Networks: Part1.
Network analysis Sushmita Roy BMI/CS 576
Albert-László Barabási
The Architecture of Complexity: Structure and Modularity in Cellular Networks Albert-László Barabási University of Notre Dame title.
School of Information University of Michigan Network resilience Lecture 20.
VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism.
Weighted networks: analysis, modeling A. Barrat, LPT, Université Paris-Sud, France M. Barthélemy (CEA, France) R. Pastor-Satorras (Barcelona, Spain) A.
Hierarchy in networks Peter Náther, Mária Markošová, Boris Rudolf Vyjde : Physica A, dec
1 Evolution of Networks Notes from Lectures of J.Mendes CNR, Pisa, Italy, December 2007 Eva Jaho Advanced Networking Research Group National and Kapodistrian.
UC Davis, May 18 th 2006 Introduction to Biological Networks Eivind Almaas Microbial Systems Division.
Complex Networks Third Lecture TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA TexPoint fonts used in EMF. Read the.
Erzsébet Ravasz, Zoltán Dezsö
Emergence of Scaling in Random Networks Barabasi & Albert Science, 1999 Routing map of the internet
Scale-free networks Péter Kómár Statistical physics seminar 07/10/2008.
Exp. vs. Scale-Free Poisson distribution Exponential Network Power-law distribution Scale-free Network.
The Barabási-Albert [BA] model (1999) ER Model Look at the distribution of degrees ER ModelWS Model actorspower grid www The probability of finding a highly.
Mining and Searching Massive Graphs (Networks)
The structure of the Internet. How are routers connected? Why should we care? –While communication protocols will work correctly on ANY topology –….they.
Biological Networks Feng Luo.
1 Complex systems Made of many non-identical elements connected by diverse interactions. NETWORK New York Times Slides: thanks to A-L Barabasi.
From Complex Networks to Human Travel Patterns Albert-László Barabási Center for Complex Networks Research Northeastern University Department of Medicine.
Peer-to-Peer and Grid Computing Exercise Session 3 (TUD Student Use Only) ‏
Sedgewick & Wayne (2004); Chazelle (2005) Sedgewick & Wayne (2004); Chazelle (2005)
Complex networks and random matrices. Geoff Rodgers School of Information Systems, Computing and Mathematics.
Global topological properties of biological networks.
Advanced Topics in Data Mining Special focus: Social Networks.
The structure of the Internet. How are routers connected? Why should we care? –While communication protocols will work correctly on ANY topology –….they.
The structure of the Internet. The Internet as a graph Remember: the Internet is a collection of networks called autonomous systems (ASs) The Internet.
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Computer Science 1 Web as a graph Anna Karpovsky.
Large-scale organization of metabolic networks Jeong et al. CS 466 Saurabh Sinha.
The Erdös-Rényi models
Optimization Based Modeling of Social Network Yong-Yeol Ahn, Hawoong Jeong.
(Social) Networks Analysis III Prof. Dr. Daning Hu Department of Informatics University of Zurich Oct 16th, 2012.
Complex networks A. Barrat, LPT, Université Paris-Sud, France I. Alvarez-Hamelin (LPT, Orsay, France) M. Barthélemy (CEA, France) L. Dall’Asta (LPT, Orsay,
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
Stefano Boccaletti Complex networks in science and society *Istituto Nazionale di Ottica Applicata - Largo E. Fermi, Florence, ITALY *CNR-Istituto.
“Adversarial Deletion in Scale Free Random Graph Process” by A.D. Flaxman et al. Hammad Iqbal CS April 2006.
Weighted networks: analysis, modeling A. Barrat, LPT, Université Paris-Sud, France M. Barthélemy (CEA, France) R. Pastor-Satorras (Barcelona, Spain) A.
The Architecture of Complexity: From the WWW to network biology title.
Part 1: Biological Networks 1.Protein-protein interaction networks 2.Regulatory networks 3.Expression networks 4.Metabolic networks 5.… more biological.
Social Network Analysis Prof. Dr. Daning Hu Department of Informatics University of Zurich Mar 5th, 2013.
The architecture of complexity: From the topology of the www to the cell's genetic network Albert-László Barabási University of Notre Dame Zoltán N. Oltvai.
Complex Networks: Models Lecture 2 Slides by Panayiotis TsaparasPanayiotis Tsaparas.
Class 9: Barabasi-Albert Model-Part I
Lecture 10: Network models CS 765: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.
Percolation and diffusion in network models Shai Carmi, Department of Physics, Bar-Ilan University Networks Percolation Diffusion Background picture: The.
LECTURE 2 1.Complex Network Models 2.Properties of Protein-Protein Interaction Networks.
Bioinformatics Center Institute for Chemical Research Kyoto University
Class 19: Degree Correlations PartII Assortativity and hierarchy
Network resilience.
Introduction to complex networks Part I: Structure
Properties of Growing Networks Geoff Rodgers School of Information Systems, Computing and Mathematics.
Transport in weighted networks: optimal path and superhighways Collaborators: Z. Wu, Y. Chen, E. Lopez, S. Carmi, L.A. Braunstein, S. Buldyrev, H. E. Stanley.
Class 2: Graph Theory IST402. Can one walk across the seven bridges and never cross the same bridge twice? Network Science: Graph Theory THE BRIDGES OF.
Robustness, clustering & evolutionary conservation Stefan Wuchty Center of Network Research Department of Physics University of Notre Dame title.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Class 2: Graph Theory IST402.
Response network emerging from simple perturbation Seung-Woo Son Complex System and Statistical Physics Lab., Dept. Physics, KAIST, Daejeon , Korea.
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Network Biology.
Lecture II Introduction to complex networks Santo Fortunato.
Weighted Networks IST402 – Network Science Acknowledgement: Roberta Sinatra Laszlo Barabasi.
Network (graph) Models
Structures of Networks
Bioinformatics 3 V6 – Biological Networks are Scale- free, aren't they? Fri, Nov 2, 2012.
Hiroki Sayama NECSI Summer School 2008 Week 2: Complex Systems Modeling and Networks Network Models Hiroki Sayama
Social Network Analysis
Advanced Topics in Data Mining Special focus: Social Networks
Presentation transcript:

Complex networks in nature PHYSBIO 2007 Imre Derényi Dept. of Biological Physics, Eötvös University, Budapest Complex systems are often made of many non-identical elements connected by diverse interactions. networks graphs

Outline  Lectures 1-3: Graph theoretical basics, examples of real networks, basic models (Erdős-Rényi, small world, scale free graphs) and their properties, examples.  Lecture 4: Dynamics on networks: error and attack tolerance, disease spreading, metabolic networks.  Lecture 5: Network motifs and communities.

Graph theory basics A graph, usually denoted as G(V,E), consists of a set of vertices (or nodes) V together with a set of edges (or links) E. Every edge connects its two endvertices. The order of a graph (denoted by N) is the number of its vertices. A graph is a simple graph if it has no multiple edges or loops. If not stated otherwise, a graph is usually assumed to be simple.

Two vertices are adjacent (or neighbors of each other) if there is an edge connecting them. Every graph can be represented by its adjacency matrix A, which is an N  N symmetric binary matrix with elements A ij = A ji = 1 if vertex i is adjacent to vertex j and A ij = A ji = 0 otherwise. The degree k i of vertex i is the number of its neighbors (or edges): The sum of the degrees of all the vertices is twice the number M of the edges of the graph:

A sequence of adjacent vertices is a walk. A walk is closed if its first and last vertices are the same, and open if they are different. A walk in which no edge occurs more than once is known as a trail. A closed trail is called tour or circuit. A walk in which no vertex occurs more than once is known as a path. A cycle can be defined as a closed path. Two vertices are reachable from each other, if there exists a path between them. A graph is connected, if any of its vertices can be reached from any other. A path or cycle is Hamiltonian if it uses all vertices exactly once. A trail or circuit is Eulerian if it uses all edges precisely once.

A component of a graph is defined as a maximal connected subgraph. A subgraph of a graph G is a graph whose vertices and edges are subsets of those of G. A subgraph of G is a spanning subgraph, or factor, if it contains all the vertices of G. k-cliques are complete subgraphs of order (size) k. Cliques are maximal complete subgraphs. A tree is an acyclic connected graph. It has N-1 edges.

The distance d(i, j) between two (not necessary distinct) vertices i and j is the length of a shortest path between them. The length l of a walk is the number of edges that it uses. The eccentricity ε(i) of a vertex i is its maximum distance from any other vertex: The diameter D of a graph is its maximum eccentricity: The characteristic path length (sometimes also called diameter) is defined as: The radius R of a graph is its minimum eccentricity:

Extensions If weight or cost is assigned to each edge, then we get a weighted graph. In the calculation of lengths the weights are taken into account. In a hypergraph more than two vertices can be connected by hyperedges. If the edges are directed, then we have a directed graph or digraph. In-neighbors and out-neighbors, and in-degrees and out-degrees can be distinguished.

Random graphs Graph theory was invented by Euler in the 18 th century. The early work was concentrated on small graphs with a high degree of regularity Random-graph theory was introduced by Erdős and Rényi in the late 1950s. As complex networks often appear to be random, random- graph theory appears to be a useful tool in the study of large complex networks.

The Erdős-Rényi model Pál Erdős Pál Erdős ( )  Original model: Connect N nodes by M edges randomly.  Alternative model: Connect every pair of the N nodes with probability p. The two models (or ensembles) become equivalent in the thermodynamic limit p=1/6 The average degree of a node is 20:52

The Erdős-Rényi model Degree distribution: The characteristic path length can be estimated from Poisson distribution resulting in

The greatest discovery of Erdős and Rényi was that many network properties appear suddenly as p is increased. As an example let us consider the occurrence of an arbitrary subgraph consisting of n vertices and m edges. Their number can be estimated as: Thus the critical probability of appearance is:

A giant (percolating) component also appears suddenly. This can easily be understood with the help of a branching process: 1.Let us start to grow a component from a seed vertex by randomly selecting its neighbors from the remaining N-1 vertices with probability p. 2.Let us repeat this process with the newly selected vertices as seeds, over and over again. 3.The branching process stops when no new neighbor is selected. If p < p c = 1/N then the expected number of new neighbors is smaller than the number of seeds, and the branching process quickly comes to a halt. If, on the other hand, p > p c = 1/N then the component can easily grow to infinity. The giant component has a tree-like structure.

Are complex networks really random? No! One big difference is that nodes are often clustered, i.e., neighbors of a node tend to be connected to each other. Clustering coefficient: Small worlds: Networks are clustered, [C >> C rand = p] but have a small characteristic path length L. Probability that the neighbors are connected

Watts-Strogatz model [Watts and Strogatz, Nature 393, 440 (1998)]

Watts-Strogatz model n nodes per block: if Optimal n:

World Wide Web 800 million documents (S. Lawrence, 1999) ROBOT: collects all URL’s found in a document and follows them recursively Nodes: WWW documents Links: URL links R. Albert, H. Jeong, A-L Barabasi, Nature, (1999) WWW

P(k=500) ~  N(k=500)~ What can we expect for ER and WS networks? The results: Scale-free network  out = 2.45  in = 2.1 WWW-power P(k=500) ~  N(k=500)~10 3  k  ~ 6 N WWW ~ 10 9

INTERNET BACKBONE (Faloutsos, Faloutsos and Faloutsos, 1999) Nodes: computers, routers Links: physical lines Internet

ACTOR CONNECTIVITIES Nodes: actors Links: cast jointly N = 212,250 actors  k  = P(k) ~k -   =2.3

SCIENCE CITATION INDEX (  = 3) Nodes: papers Links: citations (S. Redner, 1998) P(k) ~k -  1736 PRL papers (1988)

Coauthorship Nodes: scientist (authors) Links: joint publication (Newman, 2000, Barabasi et al 2001) SCIENCE COAUTHORSHIP M: math NS: neuroscience

Coauthorship Nodes: online user Links: contact Ebel, Mielsch, Bornholdt, PRE Online communities Kiel University log files 112 days, N=59,912 nodes

Food Web Nodes: trophic species Links: trophic interactions R.J. Williams, N.D. Martinez, Nature (2000) R. Sole (cond-mat/ )

Sex-web Nodes: people (Females; Males) Links: sexual relationships Liljeros et al. Nature Swedes; 18-74; 59% response rate.

Most real world networks have the same internal structure: Scale-free networks Why? What does it mean?

SCALE-FREE NETWORKS (1) The number of nodes (N ) is NOT fixed. Networks continuously expand by the addition of new nodes Examples: WWW : addition of new documents Citation : publication of new papers (2) The attachment is NOT uniform. A node is linked with higher probability to a node that already has a large number of links. Examples : WWW : new documents link to well known sites (CNN, YAHOO, NewYork Times, etc) Citation : well cited papers are more likely to be cited again Origins SF

Scale-free model (1) GROWTH : A t every timestep we add a new node with m edges (connected to the nodes already present in the system). (2) PREFERENTIAL ATTACHMENT : The probability Π that a new node will be connected to node i depends on the degree k i of that node A.-L. Barabási, R. Albert, Science 286, 509 (1999) P(k) ~k -3 BA model

Mean Field Theory, with the initial condition: A.-L.Barabási, R. Albert and H. Jeong, Physica A 272, 173 (1999) MFT

Growth without preferential attachment

Preferential Attachment Citation network Internet For given  t,  k   (k) (Jeong, Neda, A.-L. B, cond-mat/ )

 exponent is not universal Extended Model prob. p : internal links prob. q : link deletion prob. 1-p-q : add node

More models Other Models

Presence of a giant (percolating) component Branching process: The probability that an edge leads to a vertex with degree k is: The condition that the branching process prevails:

Yeast protein network Nodes : proteins Links : physical interactions (binding) P. Uetz, et al. Nature 403, (2000). Prot Interaction map

C. Elegans Li et al. Science 2004 Drosophila M. Giot et al. Science 2003

Origin of the scale-free topology of PPI networks: gene duplication Proteins with more interactions are more likely to obtain new links: Π(k) ~ k (preferential attachment) Wagner 2001; Vazquez et al. 2003; Sole et al. 2001; Rzhetsky & Gomez 2001; Qian et al. 2001; Bhan et al

Metabolic network The metabolic networks of organisms from all three domains of life are scale-free! H. Jeong, B. Tombor, R. Albert, Z.N. Oltvai, and A.L. Barabasi, Nature, (2000) ArchaeaBacteriaEukaryotes Nodes: chemicals (substrates) Links: bio-chemical reactions

Characterizing the links Metabolism: Flux Balance Analysis (Palsson) Metabolic flux for each reaction Edwards, J. S. & Palsson, B. O, PNAS 97, 5528 (2000). Edwards, J. S., Ibarra, R. U. & Palsson, B. O. Nat Biotechnol 19, 125 (2001). Ibarra, R. U., Edwards, J. S. & Palsson, B. O. Nature 420, 186 (2002). stoichiometric mx.flux vector Maximize cv, where c is the unit vector in the direction of growth (biomass production).

Global flux organization in the E. coli metabolic network E. Almaas, B. Kovács, T. Vicsek, Z. N. Oltvai, A.-L. B. Nature, 2004; Goh et al, PRL SUCC: Succinate uptake GLU : Glutamate uptake Central Metabolism, Emmerling et. al, J Bacteriol 184, 152 (2002)

Inhomogeneity in the local flux distribution ~ k Mass flows along linear pathways

Robustness Complex systems maintain their basic functions even under errors and failures (cell  mutations; Internet  router breakdowns) node failure Robustness

Robustness of scale-free networks 1 S 0 1 f fcfc AttacksFailures Robust-SF Albert, Jeong, Barabasi, Nature (2000)

Cohen, Erez, ben-Avraham, Havlin, PRL 85, 4626 (2000) After random removal of a fraction f of the vertices: The new degree distribution: Percolation: Critical fraction: Absence of a critical percolation threshold for γ ≤ 3

Achilles’ Heel of complex networks Internet failure attack Achilles Heel R. Albert, H. Jeong, A.L. Barabasi, Nature (2000)

Yeast protein network - lethality and topological position - Highly connected proteins are more essential (lethal)... Prot- robustness H. Jeong, S.P. Mason, A.-L. Barabasi, Z.N. Oltvai, Nature 411, (2001)

Disease spreading in the susceptible-infected-susceptible (SIS) epidemic model Rate of becoming infected by an infected neighbor: Rate of recovery:  Mean-field approx. for “exponential” networks, where : Steady state solution: Epidemic threshold: Pastor-Satorras and Vespignani, PRE 65, (2002)

SIS in complex networks Mean-field approximation: Steady state solution: The probability that an edge leads to a vertex with degree k is: The probability that a neighbor is infected:

SIS in complex networks Uniform immunization with probability g does not help in scale free networks if γ ≤ 3. This has a nontrivial solution when: from which we get that the epidemic threshold is:

Non-uniform immunization of complex networks Thus, the epidemic threshold is reintroduced: Ifi.e. whenthen

Motifs Motifs: Subgraphs that have a significantly higher density in the real network than in the randomized version of the studied network Randomized networks: Ensemble of maximally random networks preserving the degree distribution of the original network Function is often carried out by subnetworks, rather than by single components. R. Milo et al., Science 298, (2002)

Three-node connected subgraphs

Network motifs

Hypothesis: they are dynamically desirable “building blocks”. Feed-Forward (FF) motive is a noise filter. Why do we have motifs?

Communities: “densely connected subgraphs”

Traditional method: hierarchical clustering (agglomerative method) All edges are removed, and then added back one by one in decreasing order of their “strengths”. Communities are defined as the forming components. dendogram: The strength of the relationship between any pair of vertices can, e.g., be defined as where The matrix A l contains the number of walks with length l between the vertex pairs.

Girvan-Newman method (divisive method) It also results in a dendogram, by cutting the edges one by one. In each step the edge with the highest “betweenness centrality” (BC) is removed. The BC of an edge is the number of shortest paths between all pairs of vertices that use this edge. Girvan and Newman, PNAS 99, 7821 (2002)

Modularity When should one stop with the agglomeration/division? Newman and Girvan, PRE 69, (2004) At the maximal modularity: (fraction of edge ends being in group g) Q is the fraction of edges in the groups compared to that in the randomized network.

Potts model Minimization of the Hamiltonian: Reichardt and Bornholdt, PRL 93, (2004)

Clique percolation method (CPM) Most real networks are characterized by overlapping and nested communities. Divisive/agglomerative methods fail to identify the communities when overlaps are significant. Derényi, Palla, and Vicsek, Phys. Rev. Lett. 94, (2005) Palla, Derényi, Farkas, and Vicsek, Nature 435, (2005)

Advantages of this method: local, allows overlaps, density (not distance) based, produces no cut-nodes, … An example of overlapping k-clique communities for k=4: k-cliques are complete subgraphs of size k: k = 2k = 3k = 4k = 5 We define a community as a k-clique percolation cluster.

Studied systems: Co-authorship network Los Alamos cond-mat archive 30,739 nodes and 136,065 links Word association network South Florida Free Association norms list 10,617 nodes and 63,788 links Protein-protein interaction network DIP core list of the yeast S. cerevisiae 2,609 nodes and 6,355 links Links are usually weighted (w ij ). For each value of k (typically k=3,4,5) a threshold weight can be introduced. (Note that there is a critical threshold at which a giant cluster appears. Optimally the threshold weight should be chosen close to this critical value.)

Web of communities for the protein interaction network of yeast links represent overlaps between the communities

Community statistics community size distribution community degree distribution overlap size distr.membership number distr.

Clique percolation in an ER graph Branching process:

Dedicated web page for the CPM (software, papers, data): Some review papers: Albert and Barabasi, Rev. Mod. Phys. 74, 47 (2002). Dorogovtsev and Mendes, Adv. Phys. 51, 1079 (2002). Useful web page with papers, data, and ppt presentations: (Where many of the slides of this course have been “borrowed” from.)