Biological Networks Lectures 6-7 : February 02, 2010 Graph Algorithms Review Global Network Properties Local Network Properties 1.

Slides:



Advertisements
Similar presentations
Algorithms (and Datastructures) Lecture 3 MAS 714 part 2 Hartmut Klauck.
Advertisements

Network analysis Sushmita Roy BMI/CS 576
341: Introduction to Bioinformatics
Social network partition Presenter: Xiaofei Cao Partick Berg.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture10.
Network Properties 1.Global Network Properties ( Chapter 3 of the course textbook “Analysis of Biological Networks” by Junker and Schreiber) 1)Degree distribution.
CSE 101- Winter ‘15 Discussion Section January 26th 2015.
1 Graphs: Traversal Searching/Traversing a graph = visiting the vertices of a graph by following the edges in a systematic way Example: Given a highway.
Distributed Breadth-First Search with 2-D Partitioning Edmond Chow, Keith Henderson, Andy Yoo Lawrence Livermore National Laboratory LLNL Technical report.
VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Advanced Topics in Data Mining Special focus: Social Networks.
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα Network Measurements.
341: Introduction to Bioinformatics Dr. Nataša Pržulj Department of Computing Imperial College London Winter 2011.
Networks. Graphs (undirected, unweighted) has a set of vertices V has a set of undirected, unweighted edges E graph G = (V, E), where.
Mining and Searching Massive Graphs (Networks)
Applied Discrete Mathematics Week 12: Trees
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 8 May 4, 2005
Common Properties of Real Networks. Erdős-Rényi Random Graphs.
Sampling from Large Graphs. Motivation Our purpose is to analyze and model social networks –An online social network graph is composed of millions of.
Global topological properties of biological networks.
Search in a Small World JIN Xiaolong Based on [1].
CSE 780 Algorithms Advanced Algorithms Graph Algorithms Representations BFS.
Advanced Topics in Data Mining Special focus: Social Networks.
Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.
ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING.
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Statistical Properties of Massive Graphs (Networks) Networks and Measurements.
The Shortest Path Problem
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
Network Measures Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Measures Klout.
Large-scale organization of metabolic networks Jeong et al. CS 466 Saurabh Sinha.
The Erdös-Rényi models
Information Networks Power Laws and Network Models Lecture 3.
(Social) Networks Analysis III Prof. Dr. Daning Hu Department of Informatics University of Zurich Oct 16th, 2012.
A graph theory approach to characterize the relationship between protein functions and structure of biological networks Serene Wong March 15, 2011.
Models and Algorithms for Complex Networks Networks and Measurements Lecture 3.
Network properties Slides are modified from Networks: Theory and Application by Lada Adamic.
GRAPHS CSE, POSTECH. Chapter 16 covers the following topics Graph terminology: vertex, edge, adjacent, incident, degree, cycle, path, connected component,
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
School of Information Sciences University of Pittsburgh TELCOM2125: Network Science and Analysis Konstantinos Pelechrinis Spring 2013 Figures are taken.
Social Network Analysis Prof. Dr. Daning Hu Department of Informatics University of Zurich Mar 5th, 2013.
Complex Networks Measures and deterministic models Philippe Giabbanelli.
Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame Fall 2010.
Most of contents are provided by the website Graph Essentials TJTSD66: Advanced Topics in Social Media.
Lecture 10: Network models CS 765: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
How Do “Real” Networks Look?
341: Introduction to Bioinformatics
1 CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014 Network models Tamer Kahveci.
Informatics tools in network science
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Importance Measures on Nodes Lecture 2 Srinivasan Parthasarathy 1.
Course Name: Comparative Genomics Conducted by- Shigehiko kanaya & Md. Altaf-Ul-Amin.
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Network Biology.
Hiroki Sayama NECSI Summer School 2008 Week 2: Complex Systems Modeling and Networks Network Models Hiroki Sayama
Groups of vertices and Core-periphery structure
Biological networks CS 5263 Bioinformatics.
Network analysis.
Section 8.6: Clustering Coefficients
Community detection in graphs
Network Science: A Short Introduction i3 Workshop
Section 8.6 of Newman’s book: Clustering Coefficients
Department of Computer Science University of York
CSCI2950-C Lecture 13 Network Motifs; Network Integration
Clustering Coefficients
Section 8.3: Degree Distribution
Chapter 9 Graph algorithms
Presentation transcript:

Biological Networks Lectures 6-7 : February 02, 2010 Graph Algorithms Review Global Network Properties Local Network Properties 1

Graph Algorithms Review Readings: Chapter 2 of “Analysis of biological networks” by Junker and Björn You will be responsible for knowing the following about the following 3 algorithms: For un-weighted graphs: – Breadth-First Search (BFS) For weighted graphs: – Dijkstra’s algorithm – Floyd-Warshal algorithm 2

Graph Algorithms Review Breadth-First Search (BFS) – Input: un-weighted graph G(V,E), start node s – Ouput: Shortest paths and distances from s to all other nodes of G Connected components of G – Running time: linear, O(|V|+|E|) 3 For un-weighted graphs:

– Order of exploration of G with BFS: 1.Start from the start node s 2. Explore the neighbors of s 3.Explore the neighbors of neighbors of s from the first explored neighbor to the last one 4.… – Example : S : Graph Algorithms Review

Dijkstra’s algorithm – Input: weighted graph G(V,E), start node s – Output: shortest paths and distances from s to all other nodes of G – Running time: O(|V| log|V|+|E|) Floyd-Warshal Algorithm – Input: weighted graph G(V,E) – Output: Matrix of distances and shortest paths between all pairs of nodes of G – Running time: O(|V| 3 ) 5 Graph Algorithms Review For weighted graphs:

Network Comparisons: Properties of Large Networks Large network comparison is computationally hard due to NP- completeness of the underlying subgraph isomorphism problem. Thus, network comparisons rely on easily computable heuristics (approximate solutions), called “network properties” Network properties can roughly be divided in two categories: 1.Global network properties: give an overall view of the network, but might not be detailed enough to capture complex topological characteristics of large networks 2.Local network properties: more detailed network descriptors which usually encompass larger number of constraints, thus reducing degrees of freedom in which the networks being compared can vary. 6

1. Global Network Properties Readings: Chapter 3 of “Analysis of biological networks” by Junker and Björn Global Network Properties: 1)Degree distribution 2)Average clustering coefficient 3)Clustering spectrum 4)Average Diameter 5)Spectrum of shortest path lengths 6)Centralities 7

1)Degree Distribution Definitions: degree of a node is the number of edges incident to the node. Average degree of a network: average of the degrees over all nodes in the network. However, it might not be representative, since the distribution of degrees might be skewed Global Network Properties

Degree distribution:  Let P(k) be the percentage of nodes of degree k in the network. The degree distribution is the distribution of P(k) over all k.  P(k) can be understood as the probability that a node has degree k Global Network Properties 1) Degree Distribution

Example: (log-log plot)  Here P(k) ~ k -γ, where often 2 ≤ γ < 3. This is a power-law, heavy-tailed distribution.  Networks with power-law degree distributions are called scale-free networks. In them, most of the nodes are of low degree, but there is a small number of highly-linked nodes (nodes of high degree) called “hubs.” Global Network Properties 1) Degree Distribution

Another Example: average degree is meaningful Here P(k) is a Poisson distribution Global Network Properties 1) Degree Distribution

However: degree distribution (and global properties in general) are weak predictors of network structure. Illustration: G 1 and G 2 are of the same size (i.e.,|G 1 |=|G 2 | -- they have the same number of nodes and edges) and they have same degree distribution, but G 1 and G 2 have very different topologies (i.e., graph stucture) Global Network Properties 1) Degree Distribution

G Examples:

Definition: clustering coefficient C v of a node v: C v = |E(N(v))|/(max possible number of edges in N(v)) Where N(v) the neighborhood of v, i.e., all nodes adjacent to v C v can be viewed as the probability that two neighbors of v are connected. Thus 0 ≤ C v ≤ 1. By definition: For vertex v of degree 0 or 1, by definition C v = Global Network Properties 2) Average Clustering Coefficient

Example:  |N(v)|= 4, since there are 4 nodes in N(v), i.e., N(v)= {1, 2, 3, 4}  |E(N(v))|= 3, since there are 3 edges between nodes in N(v)  Max possible number of edges between nodes in N(v) is: choose(4,2) = 6.  Therefore C v = 3/6 = 1/ Global Network Properties 2) Average Clustering Coefficient

Definition: average clustering coefficient of a network is the average C v over all the nodes v ∈ V Global Network Properties 2) Average Clustering Coefficient

Definition: clustering spectrum, C(k), is the distribution of the average clustering coefficients of all nodes of degree k in the network, over all k. Example: Global Network Properties 3) Clustering Spectrum

C v – Clustering coefficient of node v C A = 1/1 = 1 C B = 1/3 = 0.33 C C = 0 C D = 2/10 = 0.2 … C = Avg. clust. coefficient of the whole network = avg {C v over all nodes v of G} C(k) – Avg. clust. coefficient of all nodes of degree k E.g.: C(2) = (C A + C C )/2 = (1+0)/2 = 0.5 => Clustering spectrum E.g. (not for G ) 2) And 3) Clustering Coefficient and Spectrum G

19 Definition: the distance between two nodes is the smallest number of links that have to be traversed to get from one node to the other. Definition: the shortest path is the path that achieves that distance. Definition: the average network diameter is the average of shortest path lengths over all pairs of nodes in a network. 1. Global Network Properties 4) Average Diameter

Definition: Let S(d) be the percentage of node pairs that are at distance d. The spectrum of shortest path lengths is the distribution of S(d) over d. Example: Global Network Properties 5) Spectrum of shortest path lengths

4) and 5) Average Diameter and Spectrum of Shortest Path Lengths G u v E.g. (not for G) Distance between a pair of nodes u and v: D u, v = min {length of all paths between u and v} = min {3,4,3,2} = 2 = dist(u,v) Average diameter of the whole network: D = avg {D u,v for all pairs of nodes {u,v} in G} Spectrum of the shortest path lengths

(Readings: Chapter 3 of “Analysis of biological networks”-Junker,Björn) Definitions: – Centrality quantifies the topological importance of a node (edge) in a network. There are many different types of centralities: 1.degree centrality C d: nodes with a large number of neighbors (i.e., edges) have high centrality. Therefore we have C d (v)=deg(v) Example of a use of degree centrality: In PPI networks, nodes with high degree centrality are considered to be “biologically important.” We will learn later in the course what this means Global Network Properties 6) Node Centralities

Definitions: – Centrality quantifies the topological importance of a node (edge) in a network. There are many different types of centralities: 1.Degree centrality, C d (v): nodes with a large number of neighbors (i.e., edges) have high centrality. Therefore, we have C d (v)=deg(v). 2.Closeness centrality, C c (v): nodes with short paths to all other nodes in the network have high closeness centrality C c (v)= Global Network Properties 6) Node Centralities

24 Definitions: – Centrality quantifies the topological importance of a node (edge) in a network. There are many different types of centralities: 3. Betweenness centrality, C b (v): Nodes (or edges) which occur in many of the shortest paths have high betweeness centrality. C b (v)= The above summation means that there is a sum on the top and on the bottom of the fraction. Above: σ st = the number of shortest paths from s to t (they may or not pass through node v) σ st (v) = the number of shortest paths from s to t that pass through v Global Network Properties 6) Node Centralities

25 Definitions: – Centrality quantifies the topological importance of a node (edge) in a network. There are many different types of centralities: 4. Eccentricity centrality, C e (v): Eccentricity of a node v is defined as ecc(v) = So it is the maximum shortest path length from node u to all other nodes v in V. Eccentricity centrality of a node v: C e (v) = 1/Ecc(v) Thus, central nodes have higher C e since they have lower ecc. There exist many other definitions of node centralities Global Network Properties 6) Node Centralities

Example: 26 DegreeClosenessBetweeness From highestDF, GH D, HF, G toA, B I C, E, HC, ED lowestIIA, B JJC, D, J 1. Global Network Properties 6) Node Centralities

You need to know how to compute these centralities (and all other network properties) by hand on small networks. For large real-world networks, you could use software, e.g., CentiBiN. – Global Network Properties 6) Node Centralities

28 Network Properties 2. Local Network Properties ( Chapter 5 of the course textbook “Analysis of Biological Networks” by Junker and Schreiber) 1)Network motifs 2)Graphlets Two network comparison measures based on them: 2.1) Relative Graphlet Frequence Distance between two networks 2.2) Graphlet Degree Distribution Agreement between two networks

Definition: A network motif is a small over- represented partial subgraph of real network. Here, over-represented means that it is over- represented when compared to networks coming from a random graph model. Problem: What is expected at random, i.e., which network “null model” to use to identify motifs? Local Network Properties 1) Network Motifs

30 2. Local Network Properties 1) Network Motifs Example of a random graph model: Erdos-Renyi (ER) random graphs – Definition: – A graph on n nodes (for some positive integer n) – Edges are added between pairs of nodes uniformly at random with same probability p ER graphs usually have a small number of dense (in term of number of edges) subgraphs  There will be no regions in the network that have large density of edges. Why?

Example: If motifs are identified when comparing the data with ER model networks, every dense subgraph would come up as a motif because they do not exist in our ER model networks Local Network Properties 1) Network Motifs

Motifs: – May provide insight into both the structure and function of the whole network. – Can potentially define universal classes of networks.  Networks of similar type share the same motifs (e.g., all networks that tranmit information, but in different domains) – see examples in next class  Motifs could reflect the evolutionary processes that generated these network classes Issue: network null model used to define motifs Another issue: partial versus induced subgraphs Motifs are partial subgraphs! Local Network Properties 1) Network Motifs

33 2. Local Network Properties 1) Network Motifs Example: Feed-forward loop Shen-Orr, Milo, Mangan, and Alon, “Network motifs in the transcriptional regulation network of Escherichia coli,” Nature Genetics, 2002

Definition: Graphlets are small connected induced non- isomorphic subgraphs of a large network. They do not need to be over-represented  no issues with the null model Local Network Properties 2) Graphlets

Graphlet frequencies: count the occurrences of all small (2 to 5 node) graphlets in a network. Thus, we can compare these frequencies between two networks – this is Relative Graphlet Frequency Distance (RGF-distance) measure of structural similarity between two networks Local Network Properties 2) Graphlets

36 2. Local Network Properties 2) Graphlets Graphlet Degree Distribution Agreement (GDD-agreement): Generalization of the degree distribution to a spectrum of GDD distributions Degree distribution measures: the number of nodes touching k edges for each value of k An edge is the only 2-node graphlet (graphlet denoted by G 0 in the examples below) There is nothing special about an edge Why not count how many triangles, squares,... a node touches? “GDD signature” of a node – how many times a node touches each of the graphlets at a given orbit (see examples in next class)