Presentation is loading. Please wait.

Presentation is loading. Please wait.

Qiong Cheng, Dipendra Kaur, Robert Harrison, Alexander Zelikovsky Computer Science in Georgia State University Dec. 1 2007 RECOMB Satellite Conference.

Similar presentations


Presentation on theme: "Qiong Cheng, Dipendra Kaur, Robert Harrison, Alexander Zelikovsky Computer Science in Georgia State University Dec. 1 2007 RECOMB Satellite Conference."— Presentation transcript:

1 Qiong Cheng, Dipendra Kaur, Robert Harrison, Alexander Zelikovsky Computer Science in Georgia State University Dec. 1 2007 RECOMB Satellite Conference on Systems Biology 2007 Homomorphism Mapping in Metabolic Pathways

2 Outline Concept of Metabolic pathway comparison Enzyme similarity Graph mappings: embeddings & homomorphisms Min cost homomorphism problem for trees Optimal DP algorithm for trees Min cost homomorphism problem for arbitrary graphs Minimum Feedback vertex set (MFVS) Searching metabolic networks for pathway motifs pathway holes Web tool Architecture & Brief interface Future work

3 Metabolic pathway & pathways model Metabolic pathways model A portion of pentose phosphate pathway 1.1.1.49 1.1.1.34 2.7.1.13 3.1.1.31 1.1.1.44 Metabolic pathway

4 Comparison of metabolic pathways Enzyme Similarity Pathway topology Similarity Enzyme similarity and pathway topology together represent the similarity of pathway functionality. Mismatch/Substitute match

5 Related work Linear topology Tree topology DCBA D’ X’X’ A’A’ (Forst & Schulten[1999], Chen & Hofestaedt[2004];) D CB A X’X’ B’B’ A’A’ (Pinter [2005]  V G    V T  log  V G  V G  V T  log  V T  ) Arbitrary topology Mapping : Linear pattern  Graph (Kelly et al 2004) (  V T    V G    ) Exhaustively search (Sharan et al 2005 (  V T    V G    )  Yang et al 2007 (   V G   V G    )

6 Enzyme mapping cost EC (Enzyme Commission) notation Measure Δ by tight reaction property Enzyme X = x1. x2. x3. x4 Enzyme Y = y1. y2. y3. y4 = == = Δ[X, Y ] = 1 = == Δ[X, Y ] = 10 Δ[X, Y ] = +∞ Measure Enzyme similarity score Δ by the lowest common upper class distribution Enzyme D = d1. d2. d3. d4 Δ[X, Y ] = log 2 c(X, Y ) = otherwise

7 Graph mappings: embeddings & homomorphisms Isomorphism T G f Homomorphism Isomorphic embedding Homeomorphic embedding Homomorphism f : T  G: f v : V T  V G f e : E T  paths of G Edge-to-path cost : (|f e (e)|-1) We allow different enzymes to be mapped to the same enzyme. Homomorphism cost  e in E T (|f e (e)|-1) + Δ(v, f v (v))  v in V T =

8 Min cost homomorphism of multi source tree to arbitrary graph A multi-source tree is a directed graph, whose underlying undirected graph is a tree. ignoring direction Given an multisource tree T = (Pattern) and an arbitrary graph G = (Text), find min cost homomorphism of multisource tree to arbitrary graph f : T  G

9 Preprocessing of text graph Transitive closure of G is graph G*=(V, E*), where E*={(i,j): there is i-j-path in G} Text G AB C D E F Transitive Closure of G : G* AB C D E F 1 1 1 1 1 1 2 2 3 2 2 3 Transitive closure

10 Pattern graph ordering Pattern T ab c d Ordering c bd a Construct ordered pattern T ’ DFS traversal Processing order in opposite way Each edge e i in T ’ is the unique edge connecting v i with the previous vertices in the order Ordered pattern T ’

11 DP table u1u1 … ujuj … u |V G | a b c d Text arbitrary order DT[a, u j ] min cost homomorphism mapping from T’s subgraph induced by previous vertices in the order in to G* Pattern T ab c d

12 Filling DP table  is penalty for gaps Δ(v i, u j ) if v i is a leaf in T Δ (v i, u j ) + ∑ l=1 to adj(vi) Min j’=1 to |VG| C(i l, j’) if v i is a leaf in T = DT[i, j] i<|VT| j<|VG| Recursive function h(j, j l ) = #(hops between u j and u j l in G) C[i l, j l ] = DT[i l, j l ] + (h(j, j l ) - 1) vilvil vivi Pattern T G* u j’ ujuj h(j, j ’ )

13 Runtime Analysis for mapping trees Transitive closure takes O(|V G ||E G |). The total runtime for mapping trees is O(V G ||E G |+|V G* ||V T |). Pattern graph ordering takes O(|V T | + |E T |) Dynamic programming - Calculate min contribution of all child pairs of node pair (v i ∈ T,u j ∈ G) takes t ij = deg T (v i )deg G* (u j ) - Filling DT takes  j=1 to |V G |  i=1 to |V T | t ij =  j=1 to |V G | deg G* (u j )  i=1 to |V T | deg T (v i ) = 2|E G* ||E T |

14 MFVS Minimum Feedback vertex set (MFVS) : Given: an undirected graph G=(V,E) and a nonnegative weight function w on V Find: a minimum weight subset of V whose removal leaves an acyclic graph. MFVS problem is NP-complete Bad news MFVS problem is NP-complete. 2-approximation Good news 2-approximation Greedy Algorithm 1. Delete degree 1/0 vertices from V and set remaining vertices to V ’ 2. MFVS<-  3. while V ’   do 4. pick up the set S of maximal degree vertices 5. MFVS <- MFVS U S 6. Delete degree 1/0 vertices from V ’ B A ab c d e ab c d e

15 Min cost homomorphism of arbitrary graphs Algorithm 1. Find minimum feedback vertex set F(P) of P 2. Construct a multi source tree P ’ = 3. for every sub mapping f ’ v : F(P)  V G do 4. obtain min cost homomorphism of multi source tree P ’ to arbitrary graph G under sub mapping f ’ v 5. choose min cost homomorphism for all sub mappings Given an arbitrary graph P = (Pattern) and an arbitrary graph G = (Text), find min cost homomorphism f : P  G

16 Runtime Analysis for mapping arbitrary graphs The total runtime is O(V G |F(P)| (|V G ||E G |+|V G* ||V T |)). Finding min feedback vertex set takes O(|V P | + |E T |) O(V G |F(P)| ) possible mappings for MFVS Finding min homomorphism mapping of multi source tree to arbitrary graph takes O(V G ||E G |+|V G* ||V T |).

17 Statistical significance Randomized P-Value computation Random degree-conserved graph generation: Reshuffle nodes ab cd ab cd Reshuffle edges Reshuffle edge

18 Experiments & applications Identifying conserved pathways 24 pathways that are conserved across all 4 species 18 more pathways that are conserved across at least three of these species Resolving ambiguity Discovering pathways holes All-against-all mappings among S. cerevisiae, B. subtilis, T. thermophilus, and E.coli + Hallobacterium

19 Mappings with cycles

20 Resolving Ambiguity

21 Pathway holes Check if there is such enzyme in pattern Find the closest protein in the same group If identity is too high > 80% then we expect good filling Align to previous and next enzyme – the functions may be taken over

22 Filling pathways holes TypeExample Pattern pathway (P) Text pathway (T)Mapping of interest (--.--.--.-- : potential hole) Similarity Score & Acc Num Fillings found by same EC number Gamma glutamyl cycle Superpathway of glycolysis pyruvate dehydrogenase TCA and glyoxylate bypass P: 2.3.2.2; --.--.--.--; 2.3.2.4 T: 2.3.3.9; 1.1.1.37; 2.3.3.1 051 P49814 Fillings found by group neighbors Alanine biosynthesis I Superpathway of lysine threonine methionine and S-adenosyl-L- methionine biosynthesis P: 2.6.1.66;--.--.--.--; --.--.--.--; --.--.--.--; --.--.--.--; --.--.- -.--; --.--.--.--; --.--.--.--; 5.1.1.1 T: 2.6.1.1; 2.7.2.4; 1.2.1.11; 4.2.1.52; 1.3.1.26; 2.3.1.117; 2.6.1.17; 3.5.1.18; 5.1.1.7 0.8 P39754 EC 2.6.1.16 With statistical significance 0.26 Fillings found by Left/Right neighbor Alanine biosynthesis I Superpathway of lysine threonine methionine and S-adenosyl-L- methionine biosynthesis P: 2.6.1.66;--.--.--.--; --.--.--.--; --.--.--.--; --.--.--.--; --.--.- -.--; --.--.--.--; --.--.--.--; 5.1.1.1 T: 2.6.1.1; 2.7.2.4; 1.2.1.11; 4.2.1.52; 1.3.1.26; 2.3.1.117; 2.6.1.17; 3.5.1.18; 5.1.1.7 0.7 P10725 EC 5.1.1.1

23 Web Service Architecture Pathway Database Graph Visualization Graph searching Additional Value Service Query Visualized Outputs Browsers Graph Extraction Graph Layout Cached Indexing

24 Web Interface

25

26 Future work Approximation algorithm to handle with the comparison of general graphs Mining protein interaction network Discovery of critical elements or modules based on graph comparison Discovery of evolution relation of organisms by pathway comparison of different organisms at different time points Integration with genome database

27 Reference Ron Y Pinter, Oleg Rokhlenko, Esti Yeger-Lotem, Michal Ziv-Ukelson: Alignment of metabolic pathways. Bioinformatics. LNCS 3109. Springer-Verlag.(Aug 2005)21(16): 3401-8 Sebastian Wernicke: Combinatorial Algorithms to Cope with the Complexity of Biological Networks. Dissertation (December 2006) J. Ellson, E. Gansner, E. Koutsofios, S. North, and G. Woodhull. Graphviz and dynagraph - static and dynamic graph drawing tools. In M. Junger and P. Mutzel, editors, Graph Drawing Software, pages 127-148. Springer-Verlag, 2003 Yan and J. Han. gspan: Graph-based substructure pattern mining. In ICDM, pages 721-724, 2002. N. Ketkar, L. Holder, D. Cook, R. Shah and J. Coble, Subdue: Compression-based Frequent Pattern Discovery in Graph Data, Proceedings of the ACM KDD Workshop on Open-Source Data Mining, August 2005. K, Borgwardt, S. Bottger, H. Kriegel, VGM: visual graph mining, International Conference on Management of Data archive Proceedings of the 2006 ACM SIGMOD international conference on Management of data Q. Cheng, D. Kaur, R. Harrison, and A. Zelikovsky,"Mapping and Filling Metabolic Pathways ", RECOMB Satellite Conference on Systems Biology 2007 Q. Cheng, R. Harrison, and A. Zelikovsky,"Homomorphisms of Multisource Trees into Networks with Applications to Metabolic Pathways", Proc. of IEEE 7-th International Symposium on BioInformatics and BioEngineering (BIBE'07)

28 Question? Thanks!

29 Handling Cycles Sorting of the pattern such that children can communicate only through parent “ Fix ” images for some pattern vertices => interrupt communication through cycles Feedback vertex set F(T)= V T -F(T) is acyclic Runtime is increased by factor O(V G |F(T)| ) t(v) = # of “ reasonable ” text images of v ∏ t(v) -> min ≈ ∑ log(t(v)) ->min 2-approximation algo

30 Software architecture of service-oriented pathway mining tool Distributed Pathway DB (BioCyc, KEGG) Services Container Pathway Modeling Comparison Storage Indexing Rule based mining Ambiguity pairs Potential holes …… Data-Control-View Genome DB Sequence alignment tools (Swiss-prot, Blast, ClusterW) Additional Value Service Query Visualized Outputs Browsers Cached Indexing Pathway Database DB AI PDC Simulation SW


Download ppt "Qiong Cheng, Dipendra Kaur, Robert Harrison, Alexander Zelikovsky Computer Science in Georgia State University Dec. 1 2007 RECOMB Satellite Conference."

Similar presentations


Ads by Google