Presentation is loading. Please wait.

Presentation is loading. Please wait.

Network Statistics Gesine Reinert. Yeast protein interactions.

Similar presentations


Presentation on theme: "Network Statistics Gesine Reinert. Yeast protein interactions."— Presentation transcript:

1 Network Statistics Gesine Reinert

2 Yeast protein interactions

3 Summary statistics Vertex degree distribution (the degree of a vertex is the number of vertices connected with it via an edge) Clustering coefficient: the average proportion of neighbours of a vertex that are themselves neighbours Shortest distance between two vertices - also average shortest distance, maximal distance, average of inverse distance (efficiency) Betweenness of a vertex: the number of shortest paths that go through a given vertex (similarly for edge)

4 Some examples for real networks (in averages) Networksize vertex degree shortest path Shortest path in fitted random graph ClusteringClustering in random graph Film actors225,226613.652.990.790.00027 MEDLINE coauthorship 1,520,25118.14.64.910.431.8 x 10 -4 E.Coli substrate graph 2827.352.93.040.320.026 C.Elegans282142.652.250.280.05

5 Underlying model assumptions Network consisting of vertices and edges Randomness in edges Here: assume edges undirected, no self- loops, no multiple edges

6 Main model 1: Random Graph Bernoulli random graph (Erdös+Renyi 1959, 1960): L vertices, any two connected by an edge with probability p, independent of each other need not be connected; Phase transition: for edge probability p(L) = (log L)/L the random graph becomes connected.

7 Main model 2: Watts-Strogatz Small World (1998) L vertices, each connected are to m nearest neighbours, in addition random links, each probability p (originally, rewiring edges instead of adding edges was proposed, but then the resulting network need not be connected)

8 Main model 3: Scale-free network Network growth models: start with one vertex; new vertex attaches to existing vertices by preferential attachment: vertex tends choose vertex according to vertex degree (Barabasi+Albert 1999, Price 1965)

9 Watts-Strogatz’ Small World Amenable to mathematical analysis More realistic than random graphs Shortest path length Motif counts Vertex degrees Predicting links Generalization: hard-wired links only present with a certain probability

10 Shortest path length Put ρ=2 (L-2m-1) p, where p is the probability of a shortcut Approximation: continuous model gives Expected shortest path length is approximately 1/ρ {1/2 log (L ρ) – 0.2886 } (+ distribution, Barbour + R.) In the discrete case, the distribution may be concentrated on one or two points.

11 Example: 6 degrees of separation? If the number of vertices is L=200,000,000, and we observe l=6, then we can estimate ρ as approximately 1.54 This gives for L=60,000,000 that the expected shortest path length is approximately 5.81 For L=100,000 it gives approximately 3.73 For L=6,500,000,000 it gives approximately 7.33

12 Motif counts Triangles: relate to clustering coefficient Cycles: biologically relevant Distributions: approximately compound Poisson Can get joint distribution for cycle counts of different lengths (also using compound Poisson); dependence! Goal: assess statistical significance of counts

13 Vertex degrees Random graph superimposed on hard- wired networks Poisson approximation for number of vertices with degree at least k, say Normal approximation for joint distribution of some vertex degrees Goal: assess scale-free appearance

14 Predicting links Use Bayesian analysis and biochemical properties to predict which proteins might interact Use H.pylori interactions to construct prior for E.coli interactions Assess whether small-world structure; if so, use parametric model

15 Statistical significance Clustering coefficient, vertex degrees, shortest path length are not independent Long-term goal: joint distribution of summary statistics to assess whether networks are similar or not

16 People Research students: Kaisheng Lin (motif counts, metabolic networks; vertex degrees) Pao-Yang Chen (protein interaction networks) KimHuat Lim (epidemics on networks) Collaborators: Andrew Barbour (shortest path length) Charlotte Deane (protein interaction networks) Susan Holmes (bottlenecks)


Download ppt "Network Statistics Gesine Reinert. Yeast protein interactions."

Similar presentations


Ads by Google