2 Copyright noticeMany of the images in this power point presentation of other people. The Copyright belong to the original authors. Thanks!
3 Biological Networks Biological Systems Biological Networks Made of many non-identical elements interact each other with diverse ways.Biological NetworksBiological networks as framework for the study of biological systems
4 Why Study Networks?It is increasingly recognized that complex systems cannot be described in a reductionist view.Understanding the behavior of such systems starts with understanding the topology of the corresponding network.Topological information is fundamental in constructing realistic models for the function of the network.We saw the complexity and volume of data sets we are dealing with.It is impossible to analysis or manage their properties and underlying principles due to their complexity.
13 Gene Regulation Proteins are encoded by the DNA of the organism. Proteins regulate expression of other proteins by interacting with the DNAproteinproteinproteinInducer(external signal)DNApromoter regionACCGTTGCATCoding region
19 CoExpression Network Revealed from Yeast Cell Cycle Data 1. Protein fate2. Amino acid synthesis3. Galactose metabolism4. Protein glycosylationand transportCell wallorganization5. Amino acid metabolism6. Mating7. Glucogenesis8. unknown9. Cell cycle regulationY’-clusterHistone11. Cell differentiation12. Protein synthesis10. Stress response13. Cell wall14. Energy transport15. Ribosomal biogenesisRibosomal proteinsMitochondrionProtein degradationYeast cell cycle microarray data (Spellman et al., 1998)
20 Signal transduction networks (BD BioScience)Elements inside same module often involved in same biological process.This separation can originate from spatial localization or from chemical specificity.Insulation allows the cell to carry out many diverse reactions without cross-talk that would harm the cell.Connectivity allows one function to influence another.Functional modules reflect the critical level of biological organization.A modular system can reuse existing, well-tested modules.
21 Properties of Biological Networks Scale FreeSmall worldHierarchicalModularRobustMotif
22 Scale-Free Network Degree of a node P(k) Scale-free network The number of adjacent nodesP(k)Degree distributionFrequency of nodes with degree kScale-free networkP(k) follows power lawDifferent from random networks
23 Connect with probability p Erdös-Rényi model (1960)Connect with probability pp=1/6 N=10 k ~ 1.5Pál Erdös ( )Poisson distribution- Democratic- Random
24 SCALE-FREE NETWORKS (1) The number of nodes (N) is NOT fixed. Networks continuously expand by the addition of new nodesExamples: WWW : addition of new documents Citation : publication of new papers(2) The attachment is NOT uniform.A node is linked with higher probability to a node that already has a large number of links.Examples : WWW : new documents link to well known sites (CNN, YAHOO, NewYork Times, etc) Citation : well cited papers are more likely to be cited again
25 Scale-free model P(k) ~k-3 (1) GROWTH : At every timestep we add a new node with m edges (connected to the nodes already present in the system).(2) PREFERENTIAL ATTACHMENT : The probability Π that a new node will be connected to node i depends on the connectivity ki of that nodeP(k) ~k-3A.-L.Barabási & R. Albert, Science, 1999
26 Metabolic network Archaea Bacteria Eukaryotes Organisms from all three domains of life are scale-free networks!H. Jeong, B. Tombor, R. Albert, Z.N. Oltvai, and A.L. Barabasi, Nature, 2000
27 Topology of the protein network H. Jeong, S.P. Mason, A.-L. Barabasi & Z.N. Oltvai, Nature, 2001
30 Local clustering Clustering: My friends will likely know each other! Networks are clustered [large C]
31 Clustering Coefficient The density of the network surrounding node I, characterized as the number of triangles through I. Related to network modularityk: neighbors of InI: edges between node I’s neighborsThe center node has 8 (grey) neighborsThere are 4 edges between the neighborsC = 4 /((8*(8-1)) /2)= 4/28 = 1/7
34 Small-world NetworkEvery node can be reached from every other by a small number of hops or stepsHigh clustering coefficient and low mean-shortest path lengthRandom graphs don’t necessarily have high clustering coefficientsSocial networks, the Internet, and biological networks all exhibit small-world network characteristics
35 Modularity in Cellular Networks Hypothesis:Biological function are carried by discrete functional modules.Hartwell, L.-H., Hopfield, J. J., Leibler, S., & Murray, A. W., Nature, 1999.Traditional view of modularity:
37 How do we know that metabolic networks are modular? clustering coefficient is the same across metabolic networks in different species with the same substratecorresponding randomized scale free network: C(N) ~ N-0.75 (simulation, no analytical result)bacteriaarchaea (extreme-environment single cell organisms)eukaryotes (plants, animals, fungi, protists)scale free network of the same size
38 Real Networks Have a Hierarchical Topology What does it mean?Many highly connected small clusterscombine intofew larger but less connected clusterseven larger and even less connected clustersThe degree of clustering follows:
39 2. Clustering coefficient 3. Clustering coefficient scales Properties of hierarchical networks1. Scale-free2. Clustering coefficientindependent of N3. Clustering coefficient scales
40 Hierarchy in biological systems Metabolic networksProtein networks
41 Can we identify the modules? topological overlapJ(i,j): # of nodes both i and j link to; +1 if there is a direct (i,j) link
42 Modules in the E. coli metabolism E. Ravasz et al., Science, 2002
43 Fraction of removed nodes, f RobustnessComplex systems maintain their basic functions even under errors and failures (cell mutations; Internet router breakdowns)fc1Fraction of removed nodes, fSnode failure
44 Robustness of scale-free networks 3 : fc=1(R. Cohen et. al., PRL, 2000)FailuresTopological error tolerance1fcAttacksR. Albert et.al.Nature, 2000Sf1
45 Attack Tolerance Max Cluster Size Path Length Max Cluster size changes accoeding to removing nodesRandom network is vulnerable for random attacks, power law net is not.Random attack vs targeted atack
46 - lethality and topological position - Yeast protein network- lethality and topological position -Highly connected proteins are more essential (lethal)...H. Jeong, S.P. Mason, A.-L. Barabasi &Z.N. Oltvai, Nature, 2001
48 Network motifsComparable to electronic circuit types (i.e., logic gates)The notion of motif, widely used for sequence analysis, is generalizable to the level of networks.Network Motifs are defined as recurring patterns of interconnections found within networks at frequencies much higher than those found in randomized networks.
49 Random vs designed/evolved features Large networks may contain information about design principles and/or evolution of the complex systemWhich features are there for a reason?Design principles (e.g. feed-forward loops)Constraints (e.g. the all nodes on the Internet must be connected to each other)Evolution, growth dynamics (e.g. network growth is mainly due to gene duplication)All proteins are probably interconnected thru at least one part of the DNA – protein – metabolite network
50 Network motifsUri Alon et al : “Network Motifs: Simple building Blocks of Complex Networks”; Science, 2002.Different networks were found to have different motif abundances.The motifs reflect the underlying processes that generate each type of network.
51 Motifs in the network motif to be found graph motif matches in the target graph
52 Detecting network motifs There are three main tasks in detecting network motifs:(1) Generating an ensemble of proper random networks(2) Counting the subgraphs in the real network and in random networks(3) Search for graphs that appear disproportionately in one list vs. the other
53 All 3-node connected subgraphs 13 different isomorphic types of 3-node connected subgraphThere are:node subgraphs,9,364 5-node subgraphs,etc……In order to detect network motifs, one needs to count the number of appearances of all types of n-node subgraphs in the network as well as in an ensemble of randomized networks.There are many isomorphic types of subgraphs with agiven number of nodes
54 Motifs detectedTwo significant motifs appearing numerous times in non-homologous gene systems that perform diverse biological functions
55 S. Wuchty, Z. Oltvai & A.-L. Barabasi, Nature Genetics, 2003 Motifs IIS. Wuchty, Z. Oltvai & A.-L. Barabasi, Nature Genetics, 2003
56 Probabilistic algorithm for subgraph sampling The problem :Exhaustive subgraph enumeration complexity scales as # of subgraphsExponential in subgraph sizeInfeasible for large networks with hubsSolution :An efficient sampling algorithm
57 Probabilistic algorithm for subgraph sampling Instead of examining absolute subgraph counts we define subgraph concentration :Sampling algorithm :
58 Different probabilities of sampling different subgraphs
59 Weight of each sample corrects for its sampling probabilityP=0.14W=7P=0.33W=34512367
60 Rapid convergence to real concentration Kashtan et. al. Bioinformatics 2004
61 Runtime almost independent of network size Kashtan et. al. Bioinformatics 2004