Download presentation
Presentation is loading. Please wait.
1
Network Statistics Gesine Reinert
2
Yeast protein interactions
3
Summary statistics Vertex degree distribution (the degree of a vertex is the number of vertices connected with it via an edge) Clustering coefficient: the average proportion of neighbours of a vertex that are themselves neighbours Shortest distance between two vertices - also average shortest distance, maximal distance, average of inverse distance (efficiency) Betweenness of a vertex: the number of shortest paths that go through a given vertex (similarly for edge)
4
Some examples for real networks (in averages) Networksize vertex degree shortest path Shortest path in fitted random graph ClusteringClustering in random graph Film actors225,226613.652.990.790.00027 MEDLINE coauthorship 1,520,25118.14.64.910.431.8 x 10 -4 E.Coli substrate graph 2827.352.93.040.320.026 C.Elegans282142.652.250.280.05
5
Underlying model assumptions Network consisting of vertices and edges Randomness in edges Here: assume edges undirected, no self- loops, no multiple edges
6
Main model 1: Random Graph Bernoulli random graph (Erdös+Renyi 1959, 1960): L vertices, any two connected by an edge with probability p, independent of each other need not be connected; Phase transition: for edge probability p(L) = (log L)/L the random graph becomes connected.
7
Main model 2: Watts-Strogatz Small World (1998) L vertices, each connected are to m nearest neighbours, in addition random links, each probability p (originally, rewiring edges instead of adding edges was proposed, but then the resulting network need not be connected)
8
Main model 3: Scale-free network Network growth models: start with one vertex; new vertex attaches to existing vertices by preferential attachment: vertex tends choose vertex according to vertex degree (Barabasi+Albert 1999, Price 1965)
9
Watts-Strogatz’ Small World Amenable to mathematical analysis More realistic than random graphs Shortest path length Motif counts Vertex degrees Predicting links Generalization: hard-wired links only present with a certain probability
10
Shortest path length Put ρ=2 (L-2m-1) p, where p is the probability of a shortcut Approximation: continuous model gives Expected shortest path length is approximately 1/ρ {1/2 log (L ρ) – 0.2886 } (+ distribution, Barbour + R.) In the discrete case, the distribution may be concentrated on one or two points.
11
Example: 6 degrees of separation? If the number of vertices is L=200,000,000, and we observe l=6, then we can estimate ρ as approximately 1.54 This gives for L=60,000,000 that the expected shortest path length is approximately 5.81 For L=100,000 it gives approximately 3.73 For L=6,500,000,000 it gives approximately 7.33
12
Motif counts Triangles: relate to clustering coefficient Cycles: biologically relevant Distributions: approximately compound Poisson Can get joint distribution for cycle counts of different lengths (also using compound Poisson); dependence! Goal: assess statistical significance of counts
13
Vertex degrees Random graph superimposed on hard- wired networks Poisson approximation for number of vertices with degree at least k, say Normal approximation for joint distribution of some vertex degrees Goal: assess scale-free appearance
14
Predicting links Use Bayesian analysis and biochemical properties to predict which proteins might interact Use H.pylori interactions to construct prior for E.coli interactions Assess whether small-world structure; if so, use parametric model
15
Statistical significance Clustering coefficient, vertex degrees, shortest path length are not independent Long-term goal: joint distribution of summary statistics to assess whether networks are similar or not
16
People Research students: Kaisheng Lin (motif counts, metabolic networks; vertex degrees) Pao-Yang Chen (protein interaction networks) KimHuat Lim (epidemics on networks) Collaborators: Andrew Barbour (shortest path length) Charlotte Deane (protein interaction networks) Susan Holmes (bottlenecks)
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.