Presentation is loading. Please wait.

Network Statistics Gesine Reinert. Yeast protein interactions.

Presentation on theme: "Network Statistics Gesine Reinert. Yeast protein interactions."— Presentation transcript:

Network Statistics Gesine Reinert

Yeast protein interactions

Summary statistics Vertex degree distribution (the degree of a vertex is the number of vertices connected with it via an edge) Clustering coefficient: the average proportion of neighbours of a vertex that are themselves neighbours Shortest distance between two vertices - also average shortest distance, maximal distance, average of inverse distance (efficiency) Betweenness of a vertex: the number of shortest paths that go through a given vertex (similarly for edge)

Some examples for real networks (in averages) Networksize vertex degree shortest path Shortest path in fitted random graph ClusteringClustering in random graph Film actors225,226613.652.990.790.00027 MEDLINE coauthorship 1,520,25118.14.64.910.431.8 x 10 -4 E.Coli substrate graph 2827.352.93.040.320.026 C.Elegans282142.652.250.280.05

Underlying model assumptions Network consisting of vertices and edges Randomness in edges Here: assume edges undirected, no self- loops, no multiple edges

Main model 1: Random Graph Bernoulli random graph (Erdös+Renyi 1959, 1960): L vertices, any two connected by an edge with probability p, independent of each other need not be connected; Phase transition: for edge probability p(L) = (log L)/L the random graph becomes connected.

Main model 2: Watts-Strogatz Small World (1998) L vertices, each connected are to m nearest neighbours, in addition random links, each probability p (originally, rewiring edges instead of adding edges was proposed, but then the resulting network need not be connected)

Main model 3: Scale-free network Network growth models: start with one vertex; new vertex attaches to existing vertices by preferential attachment: vertex tends choose vertex according to vertex degree (Barabasi+Albert 1999, Price 1965)

Watts-Strogatz’ Small World Amenable to mathematical analysis More realistic than random graphs Shortest path length Motif counts Vertex degrees Predicting links Generalization: hard-wired links only present with a certain probability

Shortest path length Put ρ=2 (L-2m-1) p, where p is the probability of a shortcut Approximation: continuous model gives Expected shortest path length is approximately 1/ρ {1/2 log (L ρ) – 0.2886 } (+ distribution, Barbour + R.) In the discrete case, the distribution may be concentrated on one or two points.

Example: 6 degrees of separation? If the number of vertices is L=200,000,000, and we observe l=6, then we can estimate ρ as approximately 1.54 This gives for L=60,000,000 that the expected shortest path length is approximately 5.81 For L=100,000 it gives approximately 3.73 For L=6,500,000,000 it gives approximately 7.33

Motif counts Triangles: relate to clustering coefficient Cycles: biologically relevant Distributions: approximately compound Poisson Can get joint distribution for cycle counts of different lengths (also using compound Poisson); dependence! Goal: assess statistical significance of counts

Vertex degrees Random graph superimposed on hard- wired networks Poisson approximation for number of vertices with degree at least k, say Normal approximation for joint distribution of some vertex degrees Goal: assess scale-free appearance

Predicting links Use Bayesian analysis and biochemical properties to predict which proteins might interact Use H.pylori interactions to construct prior for E.coli interactions Assess whether small-world structure; if so, use parametric model

Statistical significance Clustering coefficient, vertex degrees, shortest path length are not independent Long-term goal: joint distribution of summary statistics to assess whether networks are similar or not

People Research students: Kaisheng Lin (motif counts, metabolic networks; vertex degrees) Pao-Yang Chen (protein interaction networks) KimHuat Lim (epidemics on networks) Collaborators: Andrew Barbour (shortest path length) Charlotte Deane (protein interaction networks) Susan Holmes (bottlenecks)

Download ppt "Network Statistics Gesine Reinert. Yeast protein interactions."

Similar presentations

Ads by Google