Qiong Cheng, Dipendra Kaur, Robert Harrison, Alexander Zelikovsky Computer Science in Georgia State University Dec. 1 2007 RECOMB Satellite Conference.

Slides:



Advertisements
Similar presentations
Lecture 15. Graph Algorithms
Advertisements

gSpan: Graph-based substructure pattern mining
Depth-First Search1 Part-H2 Depth-First Search DB A C E.
Minimum Spanning Tree Sarah Brubaker Tuesday 4/22/8.
Graph Isomorphism Algorithms and networks. Graph Isomorphism 2 Today Graph isomorphism: definition Complexity: isomorphism completeness The refinement.
More Graph Algorithms Minimum Spanning Trees, Shortest Path Algorithms.
Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.
CSE 421 Algorithms Richard Anderson Dijkstra’s algorithm.
1 Seminar in Bioinformatics An efficient algorithm for detecting frequent subgraphs in biological networks Paper by: M. Koyuturk, A. Grama and W. Szpankowski.
The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.
9-1 Chapter 9 Approximation Algorithms. 9-2 Approximation algorithm Up to now, the best algorithm for solving an NP-complete problem requires exponential.
Greedy Algorithms Like dynamic programming algorithms, greedy algorithms are usually designed to solve optimization problems Unlike dynamic programming.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
1 Efficient Discovery of Conserved Patterns Using a Pattern Graph Inge Jonassen Pattern Discovery Arwa Zabian 13/07/2015.
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Approximate Labelled Subtree Homeomorphism Based on:  “Approximate Labelled Subtree Homeomorphism” R. Y. Pinter, O.Rokhlenko, D. Tsur, M. Ziv-Ukelson.
Important Problem Types and Fundamental Data Structures
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
Cost-based Optimization of Graph Queries Silke Trißl Humboldt-Universität zu Berlin Knowledge Management in Bioinformatics IDAR 2007.
Chapter 9 – Graphs A graph G=(V,E) – vertices and edges
BIBM 2008 Qiong Cheng Georgia State University Joint work with Piotr Berman (Pennstate) Robert Harrison (GSU) Alexander Zelikovsky (GSU) Fast Alignments.
Network Aware Resource Allocation in Distributed Clouds.
CS 3343: Analysis of Algorithms Lecture 21: Introduction to Graphs.
九大数理集中講義 Comparison, Analysis, and Control of Biological Networks (7) Partial k-Trees, Color Coding, and Comparison of Graphs Tatsuya Akutsu Bioinformatics.
Chapter 2 Graph Algorithms.
GRAPHS CSE, POSTECH. Chapter 16 covers the following topics Graph terminology: vertex, edge, adjacent, incident, degree, cycle, path, connected component,
QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University.
CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
TCP Traffic and Congestion Control in ATM Networks
1 Frequent Subgraph Mining Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY June 12, 2010.
An Efficient Algorithm for Enumerating Pseudo Cliques Dec/18/2007 ISAAC, Sendai Takeaki Uno National Institute of Informatics & The Graduate University.
Introduction to Bioinformatics Biological Networks Department of Computing Imperial College London March 18, 2010 Lecture hour 18 Nataša Pržulj
Qiong Cheng, Robert Harrison, Alexander Zelikovsky Computer Science in Georgia State University Oct IEEE 7 th International Conference on BioInformatics.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
University at BuffaloThe State University of New York Lei Shi Department of Computer Science and Engineering State University of New York at Buffalo Frequent.
Introduction to Graph Theory
Greedy algorithm for obtaining Minimum Feedback vertex set MFVS delete degree 1/0 vertices from V and set remaining vertices to V’ MFVS←  while V’  
CSCE555 Bioinformatics Lecture 18 Network Biology: Comparison of Networks Across Species Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu.
Minimum Spanning Trees CS 146 Prof. Sin-Min Lee Regina Wang.
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
GRAPHS. Graph Graph terminology: vertex, edge, adjacent, incident, degree, cycle, path, connected component, spanning tree Types of graphs: undirected,
Trees Thm 2.1. (Cayley 1889) There are nn-2 different labeled trees
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Graphs 2015, Fall Pusan National University Ki-Joune Li.
Introduction to Graph Theory By: Arun Kumar (Asst. Professor) (Asst. Professor)
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Indexing and Mining Free Trees Yun Chi, Yirong Yang, Richard R. Muntz Department of Computer Science University of California, Los Angeles, CA {
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Mining Complex Data COMP Seminar Spring 2011.
1 Survey of Biodata Analysis from a Data Mining Perspective Peter Bajcsy Jiawei Han Lei Liu Jiong Yang.
1 Substructure Similarity Search in Graph Databases R 陳芃安.
Gspan: Graph-based Substructure Pattern Mining
Cohesive Subgraph Computation over Large Graphs
CSCI2950-C Lecture 12 Networks
Depth-First Search.
CS 3343: Analysis of Algorithms
Large Scale Metabolic Network Alignments by Compression
Graphs Graph transversals.
CSE 373 Data Structures and Algorithms
Comparative RNA Structural Analysis
Trees.
CSCI2950-C Lecture 13 Network Motifs; Network Integration
2017, Fall Pusan National University Ki-Joune Li
CSE 589 Applied Algorithms Spring 1999
ITEC 2620M Introduction to Data Structures
SEG5010 Presentation Zhou Lanjun.
Approximate Graph Mining with Label Costs
Presentation transcript:

Qiong Cheng, Dipendra Kaur, Robert Harrison, Alexander Zelikovsky Computer Science in Georgia State University Dec RECOMB Satellite Conference on Systems Biology 2007 Homomorphism Mapping in Metabolic Pathways

Outline Concept of Metabolic pathway comparison Enzyme similarity Graph mappings: embeddings & homomorphisms Min cost homomorphism problem for trees Optimal DP algorithm for trees Min cost homomorphism problem for arbitrary graphs Minimum Feedback vertex set (MFVS) Searching metabolic networks for pathway motifs pathway holes Web tool Architecture & Brief interface Future work

Metabolic pathway & pathways model Metabolic pathways model A portion of pentose phosphate pathway Metabolic pathway

Comparison of metabolic pathways Enzyme Similarity Pathway topology Similarity Enzyme similarity and pathway topology together represent the similarity of pathway functionality. Mismatch/Substitute match

Related work Linear topology Tree topology DCBA D’ X’X’ A’A’ (Forst & Schulten[1999], Chen & Hofestaedt[2004];) D CB A X’X’ B’B’ A’A’ (Pinter [2005]  V G    V T  log  V G  V G  V T  log  V T  ) Arbitrary topology Mapping : Linear pattern  Graph (Kelly et al 2004) (  V T    V G    ) Exhaustively search (Sharan et al 2005 (  V T    V G    )  Yang et al 2007 (   V G   V G    )

Enzyme mapping cost EC (Enzyme Commission) notation Measure Δ by tight reaction property Enzyme X = x1. x2. x3. x4 Enzyme Y = y1. y2. y3. y4 = == = Δ[X, Y ] = 1 = == Δ[X, Y ] = 10 Δ[X, Y ] = +∞ Measure Enzyme similarity score Δ by the lowest common upper class distribution Enzyme D = d1. d2. d3. d4 Δ[X, Y ] = log 2 c(X, Y ) = otherwise

Graph mappings: embeddings & homomorphisms Isomorphism T G f Homomorphism Isomorphic embedding Homeomorphic embedding Homomorphism f : T  G: f v : V T  V G f e : E T  paths of G Edge-to-path cost : (|f e (e)|-1) We allow different enzymes to be mapped to the same enzyme. Homomorphism cost  e in E T (|f e (e)|-1) + Δ(v, f v (v))  v in V T =

Min cost homomorphism of multi source tree to arbitrary graph A multi-source tree is a directed graph, whose underlying undirected graph is a tree. ignoring direction Given an multisource tree T = (Pattern) and an arbitrary graph G = (Text), find min cost homomorphism of multisource tree to arbitrary graph f : T  G

Preprocessing of text graph Transitive closure of G is graph G*=(V, E*), where E*={(i,j): there is i-j-path in G} Text G AB C D E F Transitive Closure of G : G* AB C D E F Transitive closure

Pattern graph ordering Pattern T ab c d Ordering c bd a Construct ordered pattern T ’ DFS traversal Processing order in opposite way Each edge e i in T ’ is the unique edge connecting v i with the previous vertices in the order Ordered pattern T ’

DP table u1u1 … ujuj … u |V G | a b c d Text arbitrary order DT[a, u j ] min cost homomorphism mapping from T’s subgraph induced by previous vertices in the order in to G* Pattern T ab c d

Filling DP table  is penalty for gaps Δ(v i, u j ) if v i is a leaf in T Δ (v i, u j ) + ∑ l=1 to adj(vi) Min j’=1 to |VG| C(i l, j’) if v i is a leaf in T = DT[i, j] i<|VT| j<|VG| Recursive function h(j, j l ) = #(hops between u j and u j l in G) C[i l, j l ] = DT[i l, j l ] + (h(j, j l ) - 1) vilvil vivi Pattern T G* u j’ ujuj h(j, j ’ )

Runtime Analysis for mapping trees Transitive closure takes O(|V G ||E G |). The total runtime for mapping trees is O(V G ||E G |+|V G* ||V T |). Pattern graph ordering takes O(|V T | + |E T |) Dynamic programming - Calculate min contribution of all child pairs of node pair (v i ∈ T,u j ∈ G) takes t ij = deg T (v i )deg G* (u j ) - Filling DT takes  j=1 to |V G |  i=1 to |V T | t ij =  j=1 to |V G | deg G* (u j )  i=1 to |V T | deg T (v i ) = 2|E G* ||E T |

MFVS Minimum Feedback vertex set (MFVS) : Given: an undirected graph G=(V,E) and a nonnegative weight function w on V Find: a minimum weight subset of V whose removal leaves an acyclic graph. MFVS problem is NP-complete Bad news MFVS problem is NP-complete. 2-approximation Good news 2-approximation Greedy Algorithm 1. Delete degree 1/0 vertices from V and set remaining vertices to V ’ 2. MFVS<-  3. while V ’   do 4. pick up the set S of maximal degree vertices 5. MFVS <- MFVS U S 6. Delete degree 1/0 vertices from V ’ B A ab c d e ab c d e

Min cost homomorphism of arbitrary graphs Algorithm 1. Find minimum feedback vertex set F(P) of P 2. Construct a multi source tree P ’ = 3. for every sub mapping f ’ v : F(P)  V G do 4. obtain min cost homomorphism of multi source tree P ’ to arbitrary graph G under sub mapping f ’ v 5. choose min cost homomorphism for all sub mappings Given an arbitrary graph P = (Pattern) and an arbitrary graph G = (Text), find min cost homomorphism f : P  G

Runtime Analysis for mapping arbitrary graphs The total runtime is O(V G |F(P)| (|V G ||E G |+|V G* ||V T |)). Finding min feedback vertex set takes O(|V P | + |E T |) O(V G |F(P)| ) possible mappings for MFVS Finding min homomorphism mapping of multi source tree to arbitrary graph takes O(V G ||E G |+|V G* ||V T |).

Statistical significance Randomized P-Value computation Random degree-conserved graph generation: Reshuffle nodes ab cd ab cd Reshuffle edges Reshuffle edge

Experiments & applications Identifying conserved pathways 24 pathways that are conserved across all 4 species 18 more pathways that are conserved across at least three of these species Resolving ambiguity Discovering pathways holes All-against-all mappings among S. cerevisiae, B. subtilis, T. thermophilus, and E.coli + Hallobacterium

Mappings with cycles

Resolving Ambiguity

Pathway holes Check if there is such enzyme in pattern Find the closest protein in the same group If identity is too high > 80% then we expect good filling Align to previous and next enzyme – the functions may be taken over

Filling pathways holes TypeExample Pattern pathway (P) Text pathway (T)Mapping of interest ( : potential hole) Similarity Score & Acc Num Fillings found by same EC number Gamma glutamyl cycle Superpathway of glycolysis pyruvate dehydrogenase TCA and glyoxylate bypass P: ; ; T: ; ; P49814 Fillings found by group neighbors Alanine biosynthesis I Superpathway of lysine threonine methionine and S-adenosyl-L- methionine biosynthesis P: ; ; ; ; ; ; ; ; T: ; ; ; ; ; ; ; ; P39754 EC With statistical significance 0.26 Fillings found by Left/Right neighbor Alanine biosynthesis I Superpathway of lysine threonine methionine and S-adenosyl-L- methionine biosynthesis P: ; ; ; ; ; ; ; ; T: ; ; ; ; ; ; ; ; P10725 EC

Web Service Architecture Pathway Database Graph Visualization Graph searching Additional Value Service Query Visualized Outputs Browsers Graph Extraction Graph Layout Cached Indexing

Web Interface

Future work Approximation algorithm to handle with the comparison of general graphs Mining protein interaction network Discovery of critical elements or modules based on graph comparison Discovery of evolution relation of organisms by pathway comparison of different organisms at different time points Integration with genome database

Reference Ron Y Pinter, Oleg Rokhlenko, Esti Yeger-Lotem, Michal Ziv-Ukelson: Alignment of metabolic pathways. Bioinformatics. LNCS Springer-Verlag.(Aug 2005)21(16): Sebastian Wernicke: Combinatorial Algorithms to Cope with the Complexity of Biological Networks. Dissertation (December 2006) J. Ellson, E. Gansner, E. Koutsofios, S. North, and G. Woodhull. Graphviz and dynagraph - static and dynamic graph drawing tools. In M. Junger and P. Mutzel, editors, Graph Drawing Software, pages Springer-Verlag, 2003 Yan and J. Han. gspan: Graph-based substructure pattern mining. In ICDM, pages , N. Ketkar, L. Holder, D. Cook, R. Shah and J. Coble, Subdue: Compression-based Frequent Pattern Discovery in Graph Data, Proceedings of the ACM KDD Workshop on Open-Source Data Mining, August K, Borgwardt, S. Bottger, H. Kriegel, VGM: visual graph mining, International Conference on Management of Data archive Proceedings of the 2006 ACM SIGMOD international conference on Management of data Q. Cheng, D. Kaur, R. Harrison, and A. Zelikovsky,"Mapping and Filling Metabolic Pathways ", RECOMB Satellite Conference on Systems Biology 2007 Q. Cheng, R. Harrison, and A. Zelikovsky,"Homomorphisms of Multisource Trees into Networks with Applications to Metabolic Pathways", Proc. of IEEE 7-th International Symposium on BioInformatics and BioEngineering (BIBE'07)

Question? Thanks!

Handling Cycles Sorting of the pattern such that children can communicate only through parent “ Fix ” images for some pattern vertices => interrupt communication through cycles Feedback vertex set F(T)= V T -F(T) is acyclic Runtime is increased by factor O(V G |F(T)| ) t(v) = # of “ reasonable ” text images of v ∏ t(v) -> min ≈ ∑ log(t(v)) ->min 2-approximation algo

Software architecture of service-oriented pathway mining tool Distributed Pathway DB (BioCyc, KEGG) Services Container Pathway Modeling Comparison Storage Indexing Rule based mining Ambiguity pairs Potential holes …… Data-Control-View Genome DB Sequence alignment tools (Swiss-prot, Blast, ClusterW) Additional Value Service Query Visualized Outputs Browsers Cached Indexing Pathway Database DB AI PDC Simulation SW