CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014 Network alignment Tamer Kahveci 11/30/2018
What is Network Alignment? Global Alignment is GI-Complete Local Alignment is NP-Complete 11/30/2018
Metabolic Pathways 11/30/2018
What and Why? Applications Metabolic Pathway Alignment Finding a mapping of the entities of the pathways C2 C3 C4 C5 R1 R2 C1 E1 E2 Applications Drug Target Identification Metabolic Reconstruction Phylogeny Prediction C2 C4 R1 R2 C1 C5 E1 E2 11/30/2018
Challanges - Where are the compounds? - E1 C1 E2 or E1 C2 E2 ? Abstraction Graph Alignment Even after Abstraction Metabolic Pathway Alignment problem is NP Complete! Existing Algorithms Heymans et al. (2003) Clemente et al. (2005) Pinter et al. (2005) Singh et al. (2007) …. - Where are the compounds? - E1 C1 E2 or E1 C2 E2 ? E1 E2 E3 E4 E1 C1 C2 E2 C3 C4 E3 E4 E1 E2 E3 E1 C1 E2 C3 E3 Pathway Alignment is hard ! Abstraction is a problem ! 11/30/2018
Outline Graph Model of Pathways Consistency of an Alignment Homological & Topological Similarities Eigenvalue Problem Similarity Score Experimental Results 11/30/2018
Non-Redundant Graph Model Pyruv. 1.2.4.1 Lip-E ThPP R0014 S-Ac 2-ThP A-CoA Di-hy R7618 R3270 R2569 2.3.1.12 1.8.1.4 11/30/2018
Consistency 1- Align only the entities of the same type (compatible) R1 R2 C1 C2 R1 C1 2- The overall mapping should be 1-1 R1 R2 R3 11/30/2018
Consistency 3- Align two entities ui , vi only if there exists an aligned entity pair uj , vj such that uj and vj are on the reachability paths of ui and vi respectively. C3 C2 C5 C4 R1 R2 C1 Aligned Entities Backward Reachability Path Forward Reachability C2 C4 R1 R2 C1 C5 11/30/2018
Problem Statement Given a pair of metabolic pathways, our aim is to find the consistent alignment (mapping) of the entities (enzymes, reactions, compounds) such that the similarity between the pathways (SimP score) is maximized. 11/30/2018
Pairwise Similarities (Homology of Entities) 11/30/2018
Pairwise Similarities (Homology) Enzyme Similarity (SimE) Hierarchical Enzyme Similarity - Webb EC.(2002) Information-Content Enzyme Similarity - Pinter et al.(2005) Compound Similarity (SimC) Identity Score for compounds SIMCOMP Compound Similarity – Hattori et al.(2003) 11/30/2018
Pairwise Similarities Reaction Similarity (SimR) SimR (R1,R2) = Enzymes max ( SimE (E1,E3) , SimC (E2,E3) ) Input Compounds + max ( SimC (C1,C4) , SimC (C2,C4) ) Output Compounds + max ( SimC (C3,C5) , SimC (C3,C6), SimC (C3,C7) ) C1 C3 R1 C2 E1 E2 C5 C4 R2 C6 C7 E3 11/30/2018
Topological Similarity (Topology of Pathways) 11/30/2018
Neighborhood Graphs Reactions Enzymes Compounds C1 R1 C4 C8 C2 C6 E1 11/30/2018
Topological Similarities BN (R3)= {R1,R2} FN (R3)= {R4} BN (R3)= {R1} FN (R3)= {R4,R5} R1 R3 R4 AR [R3 ,R3][R2,R1] = 1 = 1 2*1 + 1*2 4 R2 R4 (|R| |R| ) x (|R| |R| ) = 16 x 16 AR matrix R1 R3 R5 R1-R1 … R2-R1 R4-R4 R4-R5 .... R3 -R3 1 / 4 ….. |R| = 4 11/30/2018
Problem Formulation Iteration 3: Support of aligned third degree neighbors added Iteration 0: Only pairwise similarity of R3 and R3 Iteration 1: Support of aligned first degree neighbors added Iteration 2: Support of aligned second degree neighbors added R1 R4 R6 R1 R3 R3 R2 R8 R2 R5 R7 R8 R5 R7 Focus on R3 – R3 matching 11/30/2018
Problem Formulation HR0 Vector HRs Vector Initial Reaction Similarity Matrix HR0 Vector HRs Vector Final Reaction Similarity Matrix 0.5 1.0 0.4 0.3 0.6 0.9 0.5 0.6 0.9 0.5 0.6 0.9 0.5 Power Method Iterations 0.5 1.0 0.4 0.3 0.5 1.0 0.4 0.3 0.3 0.5 0.8 0.3 0.5 0.8 0.1 1.0 0.2 0.9 0.1 1.0 0.2 0.9 0.3 0.5 0.8 0.2 0.3 0.6 0.9 0.2 0.3 0.6 0.9 0.2 0.3 0.6 0.9 0.1 1.0 0.2 0.9 0.2 1.0 0.4 0.6 0.2 1.0 0.4 0.6 0.2 1.0 0.4 0.6 11/30/2018
Max Weight Bipartite Matching Six Possible Orderings ONLY 3 ARE UNIQUE Reactions First Enzymes First Compounds First R First Pruning Weighted Edges Aligned Entities Inconsistent Edges Consistency Assured ! C1 E1 C1 R1 R1 E1 C2 C2 E2 R2 R2 E2 C3 C3 R3 R3 E3 C4 11/30/2018
Alignment Score ( SimP ) SimP =1 for identical pathways SimP = b Sim(C1) + Sim(C2) +Sim(C4) + (1 – b) Sim(E1) + Sim(E2) 3 2 C2 C4 R1 R2 C1 C5 C2 C3 C4 C5 R1 R2 C1 E1 E2 E1 E2 11/30/2018
Outline Graph Model of Pathways Consistency of an Alignment Homological & Topological Similarities Eigenvalue Problem Similarity Score Experimental Results 11/30/2018
Impact of Alpha = 0 : Only pairwise similarities of entities - No iterations = 1 : Only topology of the graphs a = 0.7 is good ! 11/30/2018
Alternative Entities & Paths Kim J. et al. (2007) Kuzuyama T. et al. (2006) Eukaryotes (e.g. H.Sapiens) Mevalonate Path Bacterias (e.g. E.Coli) Non-Mevalonate Path 11/30/2018
Phylogeny Prediction Archaea Our NCBI Prediction Taxonomy Eukaryota Thermoprotei Our Prediction NCBI Taxonomy Deuterostomia 11/30/2018
Effect Of Consistency Restriction 11/30/2018
Running Time 11/30/2018
How to allow subnetwork mapping 11/30/2018
Different Paths, Same Function 11/30/2018
How bad can it be? Lysine Biosynthesis E.coli A.thaliana MetNetAligner: Cheng & Zelikovsky, Bioinformatics 2009. SubMAP: Ay & Kahveci, RECOMB 2010. 11/30/2018
Alternative paths -1 11/30/2018
Alternative paths - 2 Work on this 11/30/2018
Alternative paths - 3 Work on this 11/30/2018
Mappings among major clades 11/30/2018
Dynamic programming approach 11/30/2018