Presentation is loading. Please wait.

Presentation is loading. Please wait.

Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

Similar presentations


Presentation on theme: "Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo."— Presentation transcript:

1

2 Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo

3 In the beginning there was DNA… Liolios K, Tavernarakis N, Hugenholtz P, Kyrpides, NC. The Genomes On Line Database (GOLD) v.2: a monitor of genome projects worldwide. NAR 34, D332-334NAR 34, D332-334

4 …then came protein interactions Arabidopsis PPI network E. Coli PPI network Yeast PPI network

5 Comparative Genomics to Comparative Interactomics Evolutionary conservation implies functional relevance  Sequence conservation implies functional conservation  Network conservation implies functional conservation too! What new insights might we gain from network comparisons? (Why should we care?)

6 Network comparisons allow us to: Identify conserved functional modules Query for a module, ala BLAST Predict functions of a module Predict protein functions Validate protein interactions Predict protein interactions Only possible with network comparisons Possible with existing techniques, but improved with network comparisons

7 What is a Protein Interaction Network? Proteins are nodes Interactions are edges Edges may have weights Yeast PPI network H. Jeong et al. Lethality and centrality in protein networks. Nature 411, 41 (2001)

8 The Network Alignment Problem Given k different protein interaction networks belonging to different species, we wish to find conserved sub-networks within these networks Conserved in terms of protein sequence similarity (node similarity) and interaction similarity (network topology similarity)

9 Example Network Alignment Sharan and Ideker. Modeling cellular machinery through biological network comparison. Nature Biotechnology 24, pp. 427-433, 2006

10 General Framework For Network Alignment Algorithms Sharan and Ideker. Modeling cellular machinery through biological network comparison. Nature Biotechnology 24, pp. 427-433, 2006 Network construction Scoring function Alignment algorithm Covered in lecture on network integration

11 Two Algorithms Discussed Today NetworkBLAST Sharan et al. Conserved patterns of protein interaction in multiple species. PNAS, 102(6):1974-1979, 2005. Græmlin Flannick et al. Græmlin: General and robust alignment of multiple large interaction networks. Genome Res 16: 1169-1181, 2006.

12 Overview of Sharan et al. Conserved patterns of protein interaction in multiple species. PNAS, 102(6):1974-1979, 2005.

13 Estimation of Interaction Probabilities In the preprocessing step, edges in the network are given a reliability score using a logistic regression model based on three features: 1. Number of times an interaction was observed 2. Pearson correlation coefficient between expression profiles 3. Proteins’ small world clustering coefficient

14 Network Alignment Graphs Construct a Network Alignment Graph to represent the alignment Nodes contain groups of sequence similar proteins from the k organisms Edges represent conserved interactions. An edge between two nodes is present if: 1. One pair of proteins directly interacts, the rest are distance at most 2 away 2. All protein pairs are of distance exactly 2 3. At least max(2, k – 1) protein pairs directly interact Tries to account for interaction deletions

15 Example Network Alignment Graph Nodes a b c a’ b’ c’ a’’ b’’ c’’ a b c a’ b’ c’ a’’ b’’ c’’ Network alignment graph Individual species’ PPI network Species XSpecies YSpecies Z

16 Scoring Function Sharan et al. devise a scoring scheme based on a likelihood model for the fit of a single sub-network to the given structure High scoring subgraphs correspond to structured sub-networks (cliques or pathways) Only network topology is scored, node similarity is not

17 Log Likelihood Ratio Model Measures the likelihood that a subgraph occurs if it is a conserved network vs. that if it were a randomly constructed network Randomly constructed network preserves degree distribution for nodes log Pr(Subgraph occurs | Conserved Network) Pr(Subgraph occurs | Random Network)

18 Likelihood Ratio Scoring of a Protein Complex in a Single Species U : a subset of vertices (proteins) in the PPI graph O U : collection of all observations on vertex pairs in U O uv : interaction between proteins u, v observed M s : conserved network model M n : random network (null) model T uv : proteins u, v interact F uv : proteins u, v do not interact β : probability that proteins u, v interact in conserved model p uv : probability that edge u, v exists in a random model Probability of complex being observed in a conserved network model Probability of subgraph being observed in a random network model

19 Likelihood Ratio Scoring of a Protein Complex in a Single Species Hence, log likelihood for a complex occurring in a single species is given by For multiple complexes across different species, it is the sum of the log likelihoods L(A, B, C) = L(A) + L(B) + L(C)

20 Example of Complex Scoring Nodes a b c a’ b’ c’ a’’ b’’ c’’ a b c a’ b’ c’ a’’ b’’ c’’ Conserved complex A in the Network alignment graph Individual species’ PPI network L(A) = L(X1) + L(Y1) + L (Z1) Complex X1 in Species X Complex Y1 in Species Y Complex Z1 in Species Z

21 Alignment algorithm Problem of identifying conserved sub- networks reduces to finding high scoring subgraphs NP-complete problem Heuristic solution:  Greedy extension of high scoring seeds  (Does this sound familiar? BLAST?)  Common to both papers discussed

22 Alignment algorithm 1. Find seeds for each node v in the alignment graph a. Find high scoring paths of 4 nodes by exhaustive search b. Greedily add 3 other nodes one by one, that maximally increase the score of the seed

23 Alignment algorithm 2. Iteratively add or remove nodes to increase the overall score of the node Original seeds are preserved Limit size of discovered subgraphs to 15 nodes Record up to 4 highest scoring subgraphs discovered around each node

24 Alignment algorithm 3. Filter subgraphs with a high degree of overlap Iteratively find high scoring subgraph and remove all highly overlapping ones remaining

25 Results Conserved network regions within yeast (orange ovals), fly (green rectangles) and worm (blue hexagons) PPI networks.

26 Results Prediction of protein function ‘Guilt by association’ If a conserved cluster or path is significantly enriched in a functional annotation Prediction of protein interactions Predictions based on 2 strategies: Evidence that proteins with similar sequences interact Co-occurrence of proteins in the same conserved cluster or path Experimental verification of Yeast interactions using Y2H yielded 40-62% success rate

27 Overview of Fast, scalable, network alignment  Scales linearly in number of networks compared  NetworkBLAST scales exponentially Supports efficient querying of modules Speed-sensitivity control via user defined parameter  Not supported in NetworkBLAST

28 Input to the Algorithm Weighted protein interaction graphs  Weights represent probability that proteins interact  Constructed via network integration algorithm covered in a later lecture A phylogenetic tree relating the species in the desired alignment  Used for progressive alignment

29 Definition of an alignment A set of subgraphs chosen from the interaction networks of different species, together with a mapping between aligned proteins Aligned proteins form equivalence classes  Each class was derived from a common ancestral protein  Can contain multiple proteins from the same species aa’ a’’b’’ Equivalence class showing paralogs

30 Scoring Function Log likelihood ratio model based on  Alignment model M: modules are subject to evolutionary constraint  Random model R: modules are not subject to any constraints Scores equivalence classes and alignment edges separately

31 Log Likelihood Ratio Model (Recap) Measures the likelihood that a module occurs if it is subject to evolutionary constraint vs. that if it were a randomly constructed network Randomly constructed network preserves degree distribution for nodes log Pr(Module occurs | Alignment Model M) Pr(Module occurs | Random Model R)

32 Scoring Equivalence Classes Reconstruct most parsimonious ancestral history of an equivalence class using Dynamic Programming based on five types of evolutionary events Alignment model and random model give probabilities for each of these events, combined to give a log likelihood score

33 Scoring Alignment Edges Alignment scores should reflect both network conservation and high connectivity – difficult to strike a balance Introduction of a novel scoring approach  Edge Scoring Matrix – Indexed by labels  Algorithm assigns a label to each equivalence class, scores according to distribution function in cells referenced by labels

34 Scoring: ESM

35 Alignment Algorithm: d-Clusters for Seed Generation A d-cluster consists of d proteins close together in a network “Close” means edge weights are high, so interaction is highly likely Intuition is that high scoring alignments will have high scoring d- clusters

36 Alignment Algorithm: d-Clusters for Seed Generation Identify pairs of d-clusters that score higher than a threshold T  Score is defined by greedily matching nodes from each d- cluster to obtain a high score Uses these pairs as seeds Allows for speed-sensitivity tradeoff

37 Alignment Algorithm: Generating An Initial Alignment From The Seed Determine highest scoring pair of nodes (one from each d-cluster) when aligned Align these nodes and place these nodes as well as their neighbors, into a frontier 3.0 1.5 5.0

38 Alignment Algorithm: Greedy Seed Extension Phase Examine all pairs of nodes in frontier for pair that maximally increases score when added to alignment Stops when no pair can further increase the score Remove equivalence classes if it can further increase the score Frontier Current alignment

39 Alignment Algorithm: Multiple Alignment Progressive alignment technique using the phylogenetic tree  Successively aligns closest pair of networks  Places each aligned network at the parent node of the two aligned species  Linear scaling in number of species

40 Performance Comparison: Speed-sensitivity / Linear Scaling

41 Performance Comparison: Multiple Alignment

42 Performance Comparison: Module Querying

43 Results Functional module identification using network alignment Functional module for transformation?

44 Results Functional annotation using network alignment Pairwise alignment Multiple alignment of 9 networks Conserved DNA replication module

45 Results Multiple alignment of 10 networks showing possible cell division module Functional annotation using network alignment

46 The Future of Network Comparison Græmlin Græmlin? Sharan and Ideker. Modeling cellular machinery through biological network comparison. Nature Biotechnology 24, pp. 427-433, 2006

47 That’s all folks! Thank you! Questions?

48

49 Performance Comparison: Sensitivity

50 Scoring Sequence Mutations Weighted sum of pairs scoring


Download ppt "Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo."

Similar presentations


Ads by Google