Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.

Slides:



Advertisements
Similar presentations
The overlapping community structure of complex networks.
Advertisements

Mobile Communication Networks Vahid Mirjalili Department of Mechanical Engineering Department of Biochemistry & Molecular Biology.
Network analysis Sushmita Roy BMI/CS 576
Social network partition Presenter: Xiaofei Cao Partick Berg.
Lecture 5 Graph Theory. Graphs Graphs are the most useful model with computer science such as logical design, formal languages, communication network,
Network biology Wang Jie Shanghai Institutes of Biological Sciences.
Analysis and Modeling of Social Networks Foudalis Ilias.
The multi-layered organization of information in living systems
A hub-attachment based method to detect functional modules from confidence-scored protein interactions and expression profiles Authors: Chia-Hao Chin 1,4,
Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no Anindya Bhattacharya and Rajat K. De.
Biological Networks Feng Luo.
Regulatory networks 10/29/07. Definition of a module Module here has broader meanings than before. A functional module is a discrete entity whose function.
Modularity in Biological networks.  Hypothesis: Biological function are carried by discrete functional modules.  Hartwell, L.-H., Hopfield, J. J., Leibler,
Global topological properties of biological networks.
WORKSHOP ON ONTOLOGIES OF CELLULAR NETWORKS
1 Topology, Functionality and Evolution of Metabolic Networks Jing Zhao Shanghai Center for Bioinformation and Technology 28, September,
CSE 222 Systems Programming Graph Theory Basics Dr. Jim Holten.
Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.
Finding Transcription Modules from large gene-expression data sets Ned Wingreen – Molecular Biology Morten Kloster, Chao Tang – NEC Laboratories America.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of Functional Modules in Protein Interaction Networks Zina Mohamed.
Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011.
“An Extension of Weighted Gene Co-Expression Network Analysis to Include Signed Interactions” Michael Mason Department of Statistics, UCLA.
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
The importance of enzymes and their occurrences: from the perspective of a network W.C. Liu 1, W.H. Lin 1, S.T. Yang 1, F. Jordan 2 and A.J. Davis 3, M.J.
Graph Theoretic Concepts. What is a graph? A set of vertices (or nodes) linked by edges Mathematically, we often write G = (V,E)  V: set of vertices,
GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
Introduction to Graphs. Introduction Graphs are a generalization of trees –Nodes or verticies –Edges or arcs Two kinds of graphs –Directed –Undirected.
Reconstructing gene networks Analysing the properties of gene networks Gene Networks Using gene expression data to reconstruct gene networks.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
1 Hierarchical modularity of nested bow- ties in metabolic networks Jing Zhao Shanghai Jiao Tong University Shanghai Center for Bioinformation.
Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame Fall 2010.
Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs.
Genome Biology and Biotechnology The next frontier: Systems biology Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity Institute.
Introduction to biological molecular networks
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Bioinformatics Center Institute for Chemical Research Kyoto University
Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.
Discrete Structures CISC 2315 FALL 2010 Graphs & Trees.
Shortest Path Analysis and 2nd-Order Analysis Ming-Chih Kao U of M Medical School
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Biological Networks. Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002.
Dynamic Networks: How Networks Change with Time? Vahid Mirjalili CSE 891.
Class 2: Graph Theory IST402. Can one walk across the seven bridges and never cross the same bridge twice? Network Science: Graph Theory THE BRIDGES OF.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Class 2: Graph Theory IST402.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
James Hipp Senior, Clemson University.  Graph Representation G = (V, E) V = Set of Vertices E = Set of Edges  Adjacency Matrix  No Self-Inclusion (i.
Topical Analysis and Visualization of (Network) Data Using Sci2 Ted Polley Research & Editorial Assistant Cyberinfrastructure for Network Science Center.
6/11/20161 Graph models and efficient exact algorithms in studying cancer signaling pathways Songjian Lu, Lujia Chen, Chunhui Cai Department of Biomedical.
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Network Biology.
Comparative Network Analysis BMI/CS 776 Spring 2013 Colin Dewey
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Modular organization.
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
Graph clustering to detect network modules
CSCI2950-C Lecture 12 Networks
1. SELECTION OF THE KEY GENE SET 2. BIOLOGICAL NETWORK SELECTION
Assessing Hierarchical Modularity in Protein Interaction Networks
Cellular Metabolism Metabolic processes – all chemical reactions that occur in the body Cellular metabolism- refers to all of the chemical processes that.
CSCI2950-C Lecture 13 Network Motifs; Network Integration
FUNCTIONAL ANNOTATION OF REGULATORY PATHWAYS
SEG5010 Presentation Zhou Lanjun.
Centrality of targeted and neighboring proteins.
Presentation transcript:

Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University

Outline Background. Network module definition. Algorithm for identifying modules in network.

Biological Networks Biological Systems Made of many non-identical elements connected by diverse interactions. Biological Networks Biological networks as framework for the study of biological systems

Nodes: proteins Links: physical interactions (Jeong et al., 2001) Protein Interaction Network

Metabolic Network Nodes: chemicals (substrates) Links: chemistry reactions (Ravasz et al., 2002)

Biological System are Modular There is increasing evidence that the cell system is composed of modules  A “module” in a biological system is a discrete unit whose function is separable from those of other modules  Modules defined based on functional criteria reflect the critical level of biological organization (Hartwell, et al.)  A modular system can reuse existing, well-tested modules Functional modules will be reflect in the topological structures of biological networks. Identifying functional modules and their relationship from biological networks will help to the understanding of the organization, evolution and interaction of the cellular systems they represent

Biological Modules in Biological Networks 1 2 3

Background: Identify Modules from Biological Networks Most efforts focused on detecting highly connected clusters.  Ignored the peripheral proteins.  Modules with other topology are not identified.  Modules are isolated and no inter relationship is revealed.

Background: Identify Modules from Biological Networks (continue) Traditional clustering algorithms have been applied to protein interaction networks (PIN) to find biological modules.  Need transforming PIN into weighted networks  Weight the protein interactions based on number of experiments that support the interaction (Pereira-Leal et al).  Weight with shortest path length (River et al. and Arnau et al. ).  Drawbacks  Weights are artificial.  “tie in proximity” problem in hierarchical agglomerative clustering (HAC).

Background: Identify Modules from Biological Networks (continue) Radicchi et al. (PNAS, 2004) proposed two new definitions of module in network.  For a sub-graph V  G, the degree definition of vertex i  V in a undirected graph equal to 1 if i and j are directly connected; it is equal to zero otherwise.  Strong definition of Module  Weak definition of Module

Background: Identify Modules from Biological Networks (continue)  Two module definitions do not follow the intuitive concept of module exactly.

Summary of our work A new formal definition of network modules A new agglomerative algorithm for assembling modules Application to yeast protein interaction dataset

Degree of Subgraph Given a graph G, let S be a subgraph of G (S  G).  The adjacent matrix of sub-graph S and its neighbors N can be given as:  Indegree of S, Ind(S): Where is 1 if both vertex i and vertex j are in sub-graph S and 0 otherwise.  Outdegree of S, Outd(S): Where is 1 if only one of vertex i and vertex j belong to sub-graph S and 0 otherwise.

Degree of Subgraph: Example Ind(1) =16 Outd(1)= Ind(2) =7 Outd(2)=4 Ind(3) =8 Outd(3)=5

Modularity The modularity M of a sub-graph S in a given graph G is defined as the ratio of its indegree, ind(S), and outdegree, outd(S):

New Network Module Definition A subgraph S  G is a module if M>1. Ind(1) =16 Outd(1)=5 M= Ind(2) =7 Outd(2)=4 M=1.75 Ind(3) =8 Outd(3)=5 M=1.6

Comparison to Radicchi’s Module Defintions  This sample network is a Strong module, but is not a module by this new definition based on indegree vs outdegree criteria

Agglomerative Algorithm for Identifying Network Modules Flow chart of the agglomerative algorithm

The Order of Merging Edge Betweenness (Girvan-Newman, 2002)  Defined as the number of shortest paths between all pairs of vertices that run through it.  Edges between modules have higher betweenness values. Betweenness = 20

The Order of Merging (continue) Gradually deleting the edge with the highest betweenness will generate an order of edges.  Edges between modules will be deleted earlier.  Edges inside modules will be deleted later. Reverse the deletion order of edges and use it as the merging order.

When Merging Occurs? Between two non-modules Between a non-module and a module Not between two modules

Testing Data Set Yeast Core Protein Interaction Network (PIN).  The yeast core PIN from Database of Interacting Proteins (DIP) (version ScereCR ).  Total: 2609 proteins; 6355 links.  Large component: 2440 proteins, 6401 interactions.

86 Modules Obtained from DIP Yeast core PIN

Robustness of Modules

Validation of modules Annotated each protein with the Gene Ontology TM (GO) terms from the Saccharomyces Genome Database (SGD) (Cherry et al. 1998; Balakrishna et al) Quantified the co-occurrence of GO terms using the hypergeometric distribution analysis supported by the Gene Ontology Term Finder of SGD(Balakrishna et al) The results show that each module has statistically significant co- occurrence of bioprocess GO categories

Validation of modules Modules with 100% GO frequency Module #GOIDGO_termFrequencyGenome FrequencyProbability pH reduction14 out of 14 genes, 100%21 out of E mRNA catabolism14 out of 14 genes, 100%55 out of E pre-replicative complex formation and maintenance7 out of 7 genes, 100%13 out of E SRP-dependent cotranslational protein-membrane targeting, signal sequence recognition6 out of 6 genes, 100%7 out of E 'de novo' pyrimidine base biosynthesis5 out of 5 genes, 100%5 out of E retrograde transport, endosome to Golgi5 out of 5 genes, 100%10 out of E double-strand break repair via nonhomologous end- joining5 out of 5 genes, 100%19 out of E sulfur amino acid metabolism5 out of 5 genes, 100%31 out of E Golgi to vacuole transport4 out of 4 genes, 100%18 out of E regulation of carbohydrate metabolism4 out of 4 genes, 100%26 out of E-10

Validation of modules Most significant GO term in top 10 largest modules Module #Module SizeGOIDGO termFrequencyGenome FrequencyProbability nucleocytoplasmic transport62 out of 201 genes, 30.8%105 out of E protein catabolism46 out of 111 genes, 41.4%175 out of E mRNA metabolism58 out of 93 genes, 62.3%184 out of E cytoplasm organization and biogenesis56 out of 76 genes, 73.6%250 out of E actin cytoskeleton organization and biogenesis31 out of 59 genes, 52.5%101 out of E transcription from RNA polymerase II promoter34 out of 50 genes, 68%270 out of E histone acetylation17 out of 45 genes, 37.7%28 out of E rRNA processing34 out of 45 genes, 75.5%175 out of E Golgi vesicle transport36 out of 44 genes, 81.8%137 out of E chromatin remodeling18 out of 42 genes, 42.8%128 out of E-21

Validation of modules Comparison with module definitions of Radicchi et al.  Running the agglomerative algorithm based on different definitions Average lowest P value (- log10) Number of Modules (larger than 3) Our Weak Strong

Validation of modules

P values of modules obtained based our definition plot against P values of the corresponding weak modules (line is y=x).

Constructing the Network of Modules Assembling the 86 MoNet modules to form an interconnected network of modules.  For each adjacent module pair, the edge that is deleted last by the G-N algorithm was selected from all the edges that connect two modules to represent the link between two modules

A Section of Module Network of 30 Largest Modules

Conclusions Provide a framework for decomposing the protein interaction network into functional modules The modules obtained appear to be biological functional modules based on clustering of Gene Ontology terms The network of modules provides a plausible way to understanding the interactions between these functional modules With the increasing amounts of protein interaction data available, our approach will help construct a more complete view of interconnected functional modules to better understand the organization of the whole cellular system

Questions?

Limitation of Global Algorithms Biological networks are incomplete. Each vertex can only belong to one module.

Local Optimization Algorithm

139 Modules Obtained from DIP Yeast core PIN

Example of Module Overlap

Interconnected Module Network