Ferhat Ay, Tamer Kahveci & Valerie de-Crecy Lagard 4/17/20151 Ferhat Ay www.cise.ufl.edu/~fay.

Slides:



Advertisements
Similar presentations
B. Knudsen and J. Hein Department of Genetics and Ecology
Advertisements

Weiren Yu 1, Jiajin Le 2, Xuemin Lin 1, Wenjie Zhang 1 On the Efficiency of Estimating Penetrating Rank on Large Graphs 1 University of New South Wales.
+ Multi-label Classification using Adaptive Neighborhoods Tanwistha Saha, Huzefa Rangwala and Carlotta Domeniconi Department of Computer Science George.
JAMES LINDSAY*, HAMED SALOOTI, ALEX ZELIKOVSKI, ION MANDOIU* Scaffolding Large Genomes Using Integer Linear Programming University of Connecticut*Georgia.
De-anonymizing social networks Arvind Narayanan, Vitaly Shmatikov.
An iterative algorithm for metabolic network-based drug target identification Padmavati Sridhar, Tamer Kahveci, Sanjay Ranka Department of Computer and.
Reference-based Indexing of Sequence Databases Jayendra Venkateswaran, Deepak Lachwani, Tamer Kahveci, Christopher Jermaine University of Florida-Gainesville.
Multi-Label Prediction via Compressed Sensing By Daniel Hsu, Sham M. Kakade, John Langford, Tong Zhang (NIPS 2009) Presented by: Lingbo Li ECE, Duke University.
Chapter 7 Dynamic Programming.
Assigning Transmembrane Segments to Helices in Intermediate-Resolution Structures Angela Enosh Sarel J. Fleishman Nir Ben-Tal & Dan Halperin Adapted from.
Gene Prediction: Similarity-Based Approaches (selected from Jones/Pevzner lecture notes)
Structural bioinformatics
Ehsan Ullah, Prof. Soha Hassoun Department of Computer Science Mark Walker, Prof. Kyongbum Lee Department of Chemical and Biological Engineering Tufts.
1 Image Completion using Global Optimization Presented by Tingfan Wu.
Expected accuracy sequence alignment
1 Seminar in Bioinformatics An efficient algorithm for detecting frequent subgraphs in biological networks Paper by: M. Koyuturk, A. Grama and W. Szpankowski.
7 -1 Chapter 7 Dynamic Programming Fibonacci Sequence Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, … F i = i if i  1 F i = F i-1 + F i-2 if.
Multiple sequence alignment
BNFO 602 Multiple sequence alignment Usman Roshan.
Finding the optimal pairwise alignment We are interested in finding the alignment of two sequences that maximizes the similarity score given an arbitrary.
Fast identification and statistical evaluation of segmental homologies in comparative maps Peter Calabrese 1, Sugata Chakravarty 2 and Todd Vision 3 1.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Graph Indexing Techniques Seoul National University IDB Lab. Kisung Kim
Similarity Methods C371 Fall 2004.
Adaptive CSMA under the SINR Model: Fast convergence using the Bethe Approximation Krishna Jagannathan IIT Madras (Joint work with) Peruru Subrahmanya.
Pairwise Sequence Alignment (I) (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 22, 2005 ChengXiang Zhai Department of Computer Science University.
VAST 2011 Sebastian Bremm, Tatiana von Landesberger, Martin Heß, Tobias Schreck, Philipp Weil, and Kay Hamacher Interactive-Graphics Systems TU Darmstadt,
Toward Automatically Drawn Metabolic Pathway Atlas with Peripheral Node Abstraction Algorithm Myungha Jang, Arang Rhie, and Hyun-Seok Park * Bioinformatics.
Hubba: Hub Objects Analyzer—A Framework of Interactome Hubs Identification for Network Biology 吳 信 宏, Hsin-Hung Wu Laboratory.
Introduction to Bioinformatics Biological Networks Department of Computing Imperial College London March 18, 2010 Lecture hour 18 Nataša Pržulj
Input Sensitive Algorithms for Multiple Sequence Alignment Pankaj Yonatan University Rachel
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova , Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.
AAAI 2011, San Francisco Trajectory Regression on Road Networks Tsuyoshi Idé (IBM Research – Tokyo) Masashi Sugiyama (Tokyo Institute of Technology)
Greedy algorithm for obtaining Minimum Feedback vertex set MFVS delete degree 1/0 vertices from V and set remaining vertices to V’ MFVS←  while V’  
Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.
Expected accuracy sequence alignment Usman Roshan.
Runtime O(VE), for +/- edges, Detects existence of neg. loops
Graph-based Deformable Matching of 3D Line Segments with Application in Protein Fitting 12 1 HANG DOU 1, MATTHEW L BAKER 2, TAO JU Washington University.
1 CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014 Network problems Tamer Kahveci.
Problem Statement How do we represent relationship between two related elements ?
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Main Index Contents 11 Main Index Contents Graph Categories Graph Categories Example of Digraph Example of Digraph Connectedness of Digraph Connectedness.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Non negative matrix factorization for Global Network Alignment
Learning to Align: a Statistical Approach
Spectral methods for Global Network Alignment
Rajaraman-Wong Algorithm
Large Scale Metabolic Network Alignments by Compression
Outline Nonlinear Dimension Reduction Brief introduction Isomap LLE
Joining Massive High-Dimensional Datasets
CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014
BNFO 602 Phylogenetics Usman Roshan.
Intro to Alignment Algorithms: Global and Local
Comparative RNA Structural Analysis
Problem Solving 4.
Floyd’s Algorithm (shortest-path problem)
CSE 589 Applied Algorithms Spring 1999
Spectral methods for Global Network Alignment
Block Matching for Ontologies
Protein structure prediction.
Graph Homomorphism Revisited for Graph Matching
Bioinformatics, Vol.17 Suppl.1 (ISMB 2001) Weekly Lab. Seminar
Chapter 16 1 – Graphs Graph Categories Strong Components
Approximate Graph Mining with Label Costs
Protein Structural Classification
GRAPHS.
Multiple Sequence Alignment
Presentation transcript:

Ferhat Ay, Tamer Kahveci & Valerie de-Crecy Lagard 4/17/20151 Ferhat Ay

Metabolic Pathways 4/17/20152 Ferhat Ay

What and Why? 4/17/20153Ferhat Ay Metabolic Pathway Alignment Finding a mapping of the entities of the pathways C2 C3 C4 C5 R1R2 C1 E1E2 C2C4 R1R2 C1 C5 E1 E2 Applications ○ Drug Target Identification ○ Metabolic Reconstruction ○ Phylogeny Prediction

Challanges 4/17/20154 Ferhat Ay E1E2E3 E4 E1E2E3 Graph Alignment  Even after Abstraction Metabolic Pathway Alignment problem is NP Complete!  Existing Algorithms  Heymans et al. (2003)  Clemente et al. (2005)  Pinter et al. (2005)  Singh et al. (2007)  ….  Abstraction is a problem ! E1 C1 C2 E2 C3 C4 E3 E4 E1 C1 E2 C3 E3 - Where are the compounds? - E1  C1  E2 or E1  C2  E2 ?  Pathway Alignment is hard ! Abstraction

Outline 4/17/20155 Ferhat Ay  Graph Model of Pathways  Consistency of an Alignment  Homological & Topological Similarities  Eigenvalue Problem  Similarity Score  Experimental Results

Non-Redundant Graph Model 4/17/20156 Ferhat Ay Pyruv Lip-EThPP R0014 S-Ac 2-ThP A-CoA Di-hy R7618R3270 R

Consistency 4/17/20157 Ferhat Ay 1- Align only the entities of the same type (compatible) R1R2 C1C2 R1 C1 2- The overall mapping should be 1-1 R1 R2 R3

Consistency 4/17/20158 Ferhat Ay C3 C2 C5 C4 R1R2 C1 C2C4 R1R2 C1 C5 3- Align two entities u i, v i only if there exists an aligned entity pair u j, v j such that u j and v j are on the reachability paths of u i and v i respectively. Aligned Entities Backward Reachability Path Forward Reachability Path

Problem Statement 4/17/20159 Ferhat Ay Given a pair of metabolic pathways, our aim is to find the consistent alignment (mapping) of the entities (enzymes, reactions, compounds) such that the similarity between the pathways (SimP score) is maximized.

4/17/ Ferhat Ay Pairwise Similarities (Homology of Entities)

Pairwise Similarities (Homology) 4/17/ Ferhat Ay  Enzyme Similarity (SimE) Hierarchical Enzyme Similarity - Webb EC.(2002) Information-Content Enzyme Similarity - Pinter et al.(2005)  Compound Similarity (SimC) Identity Score for compounds SIMCOMP Compound Similarity – Hattori et al.(2003)

Pairwise Similarities 4/17/ Ferhat Ay  Reaction Similarity (SimR) E1 R1 C3 C1 C2 R2 C6 C4 C7 C5 E2 E3 SimR (R1,R2) = Enzymes max ( SimE (E1,E3), SimC (E2,E3) ) Input Compounds + max ( SimC (C1,C4), SimC (C2,C4) ) Output Compounds + max ( SimC (C3,C5), SimC (C3,C6), SimC (C3,C7) ) SimR (R1,R2) = Enzymes max ( SimE (E1,E3), SimC (E2,E3) ) Input Compounds + max ( SimC (C1,C4), SimC (C2,C4) ) Output Compounds + max ( SimC (C3,C5), SimC (C3,C6), SimC (C3,C7) )

4/17/ Ferhat Ay Topological Similarity (Topology of Pathways)

Neighborhood Graphs 4/17/ Ferhat Ay C4 C5 C6 C7 R1 R2 C1 E2 R3R4 E1 E3 C3 C2 C9 C8 E1E2E3 Enzymes R2 R3 R1 R4 Reactions C1 C3 C2 C4 C5 C6 C7 C8 C9 Compounds

Topological Similarities 4/17/ Ferhat Ay R2 R3 R1 R4 R1R3 R4 R5 |R| = 4 BN (R3)= {R1,R2} FN (R3)= {R4} BN (R3)= {R1} FN (R3)= {R4,R5} (|R| |R| ) x (|R| |R| ) = 16 x 16 A R matrix R1-R1…R2-R1…R4-R4…R4-R5.... R3 -R3 1 / ….. A R [R3,R3][R2,R1] = 1 = 1 2*1 + 1*2 4

Problem Formulation 4/17/ Ferhat Ay R2 R3 R1 R4 R5 R6 R7 R8 R3 R1 R2 R5R7 R8 Focus on R3 – R3 matching Iteration 1: Support of aligned first degree neighbors addedIteration 2: Support of aligned second degree neighbors added Iteration 3: Support of aligned third degree neighbors added Iteration 0: Only pairwise similarity of R3 and R3

4/17/ Ferhat Ay Initial Reaction Similarity Matrix H R 0 Vector H R s Vector Final Reaction Similarity Matrix Power Method Iterations Problem Formulation

Max Weight Bipartite Matching 4/17/ Ferhat Ay  Six Possible Orderings ONLY 3 ARE UNIQUE ○ Reactions First ○ Enzymes First ○ Compounds First  R First Pruning R1 R2 R3 R1 R3 R2 C1 C2 C3 C4 C2 C3 E1 E2 E3 E1 E2 Consistency Assured ! Weighted Edges Aligned Entities Inconsistent Edges

Alignment Score ( SimP ) 4/17/ Ferhat Ay C2 C3 C4 C5 R1R2 C1 C2C4 R1R2 C1 C5 0 =< SimP <= 1 SimP =1 for identical pathways SimP =  Sim(C1) + Sim(C2) +Sim(C4) + ( 1 –  Sim(E1) + Sim(E2) 3 2 E1 E2

Outline 4/17/ Ferhat Ay  Graph Model of Pathways  Consistency of an Alignment  Homological & Topological Similarities  Eigenvalue Problem  Similarity Score  Experimental Results

Impact of Alpha 4/17/ Ferhat Ay  = 0  Only pairwise similarities of entities - No iterations  = 1  Only topology of the graphs  = 0.7 is good !

Alternative Entities & Paths 4/17/ Ferhat Ay Kim J. et al. (2007) Eukaryotes (e.g. H.Sapiens)  Mevalonate Path Bacterias (e.g. E.Coli)  Non-Mevalonate Path Kuzuyama T. et al. (2006)

Phylogeny Prediction 4/17/ Ferhat Ay Thermoprotei Eukaryota Archaea NCBI Taxonomy Our Prediction Deuterostomia

Effect Of Consistency Restriction 4/17/ Ferhat Ay

Running Time 4/17/ Ferhat Ay

4/17/2015 Ferhat Ay 26 For source code and more information:

4/17/2015 Ferhat Ay 27

Error Tolerance 4/17/ Ferhat Ay

Pylogenetic Reconstruction 4/17/201529Ferhat Ay

Effect Of Consistency Restriction 4/17/201530Ferhat Ay

Z-Score Calculation 4/17/201531Ferhat Ay

E1 C1 C2 E2 C3 C4 E3 E4 Challanges 4/17/ Ferhat Ay E1E2E3 E4 E1 C1 E2 C3 E3E1E2E3 - Where are the compounds? - E1  C1  E2 or E1  C2  E2 ? Pathway 1 Pathway 2 Abstraction is a Problem! Pathway 1 Abstracted Pathway 2 Abstracted NO AbstractionAbstraction Alignment Problem is NP Complete !