CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014

Slides:



Advertisements
Similar presentations
De-anonymizing social networks Arvind Narayanan, Vitaly Shmatikov.
Advertisements

An iterative algorithm for metabolic network-based drug target identification Padmavati Sridhar, Tamer Kahveci, Sanjay Ranka Department of Computer and.
Ferhat Ay, Tamer Kahveci & Valerie de-Crecy Lagard 4/17/20151 Ferhat Ay
1 CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014 Network construction from RNAi data Tamer Kahveci.
Gene Prediction: Similarity-Based Approaches (selected from Jones/Pevzner lecture notes)
Structural bioinformatics
Ehsan Ullah, Prof. Soha Hassoun Department of Computer Science Mark Walker, Prof. Kyongbum Lee Department of Chemical and Biological Engineering Tufts.
Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network.
Expected accuracy sequence alignment
Large-Scale Global Alignments Multiple Alignments Lecture 10, Thursday May 1, 2003.
Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.
BNFO 602 Multiple sequence alignment Usman Roshan.
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Roadmap-Based End-to-End Traffic Engineering for Multi-hop Wireless Networks Mustafa O. Kilavuz Ahmet Soran Murat Yuksel University of Nevada Reno.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Sequence Alignment and Phylogenetic Prediction using Map Reduce Programming Model in Hadoop DFS Presented by C. Geetha Jini (07MW03) D. Komagal Meenakshi.
Querying Big Graphs within Bounded Resources 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.
Graph Indexing Techniques Seoul National University IDB Lab. Kisung Kim
ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University.
Adaptive CSMA under the SINR Model: Fast convergence using the Bethe Approximation Krishna Jagannathan IIT Madras (Joint work with) Peruru Subrahmanya.
VAST 2011 Sebastian Bremm, Tatiana von Landesberger, Martin Heß, Tobias Schreck, Philipp Weil, and Kay Hamacher Interactive-Graphics Systems TU Darmstadt,
Toward Automatically Drawn Metabolic Pathway Atlas with Peripheral Node Abstraction Algorithm Myungha Jang, Arang Rhie, and Hyun-Seok Park * Bioinformatics.
Expert Systems Part 2 IE 469 Manufacturing Systems 469 صنع نظم التصنيع.
Introduction to Bioinformatics Biological Networks Department of Computing Imperial College London March 18, 2010 Lecture hour 18 Nataša Pržulj
Qiong Cheng, Robert Harrison, Alexander Zelikovsky Computer Science in Georgia State University Oct IEEE 7 th International Conference on BioInformatics.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova , Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.
AAAI 2011, San Francisco Trajectory Regression on Road Networks Tsuyoshi Idé (IBM Research – Tokyo) Masashi Sugiyama (Tokyo Institute of Technology)
Gene Prediction: Similarity-Based Methods (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 15, 2005 ChengXiang Zhai Department of Computer Science.
Greedy algorithm for obtaining Minimum Feedback vertex set MFVS delete degree 1/0 vertices from V and set remaining vertices to V’ MFVS←  while V’  
Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.
Expected accuracy sequence alignment Usman Roshan.
1 CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014 Network problems Tamer Kahveci.
1 CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014 Network models Tamer Kahveci.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Main Index Contents 11 Main Index Contents Graph Categories Graph Categories Example of Digraph Example of Digraph Connectedness of Digraph Connectedness.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Non negative matrix factorization for Global Network Alignment
Learning to Align: a Statistical Approach
CSCI2950-C Lecture 12 Networks
Spectral methods for Global Network Alignment
Maximum Flow c v 3/3 4/6 1/1 4/7 t s 3/3 w 1/9 3/5 1/1 3/5 u z 2/2
Structural analysis of metabolic network models
The ideal approach is simultaneous alignment and tree estimation.
Large Scale Metabolic Network Alignments by Compression
Reachability Analysis Bioinformatics Research Group
SiS: Significant Subnetworks in Massive Number of Network Topologies
Sequence Alignment Using Dynamic Programming
Maximum Flow c v 3/3 4/6 1/1 4/7 t s 3/3 w 1/9 3/5 1/1 3/5 u z 2/2
Reachability Analysis Bioinformatics Research Group
Incremental Network Querying in Biological Networks
BNFO 602 Phylogenetics Usman Roshan.
BNFO 602 Phylogenetics – maximum parsimony
Intro to Alignment Algorithms: Global and Local
Comparative RNA Structural Analysis
Problem Solving 4.
CSE 589 Applied Algorithms Spring 1999
Spectral methods for Global Network Alignment
Protein structure prediction.
Bioinformatics, Vol.17 Suppl.1 (ISMB 2001) Weekly Lab. Seminar
Maximum Flow c v 3/3 4/6 1/1 4/7 t s 3/3 w 1/9 3/5 1/1 3/5 u z 2/2
Chapter 16 1 – Graphs Graph Categories Strong Components
Approximation Algorithms for the Selection of Robust Tag SNPs
“Traditional” image segmentation
Approximate Graph Mining with Label Costs
Protein Structural Classification
Reachability Analysis
Presentation transcript:

CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014 Network alignment Tamer Kahveci 11/30/2018

What is Network Alignment? Global Alignment is GI-Complete Local Alignment is NP-Complete 11/30/2018

Metabolic Pathways 11/30/2018

What and Why? Applications Metabolic Pathway Alignment Finding a mapping of the entities of the pathways C2 C3 C4 C5 R1 R2 C1 E1 E2 Applications Drug Target Identification Metabolic Reconstruction Phylogeny Prediction C2 C4 R1 R2 C1 C5 E1 E2 11/30/2018

Challanges - Where are the compounds? - E1  C1  E2 or E1  C2  E2 ? Abstraction Graph Alignment Even after Abstraction Metabolic Pathway Alignment problem is NP Complete! Existing Algorithms Heymans et al. (2003) Clemente et al. (2005) Pinter et al. (2005) Singh et al. (2007) …. - Where are the compounds? - E1  C1  E2 or E1  C2  E2 ? E1 E2 E3 E4 E1 C1 C2 E2 C3 C4 E3 E4 E1 E2 E3 E1 C1 E2 C3 E3 Pathway Alignment is hard ! Abstraction is a problem ! 11/30/2018

Outline Graph Model of Pathways Consistency of an Alignment Homological & Topological Similarities Eigenvalue Problem Similarity Score Experimental Results 11/30/2018

Non-Redundant Graph Model Pyruv. 1.2.4.1 Lip-E ThPP R0014 S-Ac 2-ThP A-CoA Di-hy R7618 R3270 R2569 2.3.1.12 1.8.1.4 11/30/2018

Consistency 1- Align only the entities of the same type (compatible) R1 R2 C1 C2 R1 C1 2- The overall mapping should be 1-1 R1 R2 R3 11/30/2018

Consistency 3- Align two entities ui , vi only if there exists an aligned entity pair uj , vj such that uj and vj are on the reachability paths of ui and vi respectively. C3 C2 C5 C4 R1 R2 C1 Aligned Entities Backward Reachability Path Forward Reachability C2 C4 R1 R2 C1 C5 11/30/2018

Problem Statement Given a pair of metabolic pathways, our aim is to find the consistent alignment (mapping) of the entities (enzymes, reactions, compounds) such that the similarity between the pathways (SimP score) is maximized. 11/30/2018

Pairwise Similarities (Homology of Entities) 11/30/2018

Pairwise Similarities (Homology) Enzyme Similarity (SimE) Hierarchical Enzyme Similarity - Webb EC.(2002) Information-Content Enzyme Similarity - Pinter et al.(2005) Compound Similarity (SimC) Identity Score for compounds SIMCOMP Compound Similarity – Hattori et al.(2003) 11/30/2018

Pairwise Similarities Reaction Similarity (SimR) SimR (R1,R2) = Enzymes max ( SimE (E1,E3) , SimC (E2,E3) ) Input Compounds + max ( SimC (C1,C4) , SimC (C2,C4) ) Output Compounds + max ( SimC (C3,C5) , SimC (C3,C6), SimC (C3,C7) ) C1 C3 R1 C2 E1 E2 C5 C4 R2 C6 C7 E3 11/30/2018

Topological Similarity (Topology of Pathways) 11/30/2018

Neighborhood Graphs Reactions Enzymes Compounds C1 R1 C4 C8 C2 C6 E1 11/30/2018

Topological Similarities BN (R3)= {R1,R2} FN (R3)= {R4} BN (R3)= {R1} FN (R3)= {R4,R5} R1 R3 R4 AR [R3 ,R3][R2,R1] = 1 = 1 2*1 + 1*2 4 R2 R4 (|R| |R| ) x (|R| |R| ) = 16 x 16 AR matrix R1 R3 R5 R1-R1 … R2-R1 R4-R4 R4-R5 .... R3 -R3 1 / 4 ….. |R| = 4 11/30/2018

Problem Formulation Iteration 3: Support of aligned third degree neighbors added Iteration 0: Only pairwise similarity of R3 and R3 Iteration 1: Support of aligned first degree neighbors added Iteration 2: Support of aligned second degree neighbors added R1 R4 R6 R1 R3 R3 R2 R8 R2 R5 R7 R8 R5 R7 Focus on R3 – R3 matching 11/30/2018

Problem Formulation HR0 Vector HRs Vector Initial Reaction Similarity Matrix HR0 Vector HRs Vector Final Reaction Similarity Matrix 0.5 1.0 0.4 0.3 0.6 0.9 0.5 0.6 0.9 0.5 0.6 0.9 0.5 Power Method Iterations 0.5 1.0 0.4 0.3 0.5 1.0 0.4 0.3 0.3 0.5 0.8 0.3 0.5 0.8 0.1 1.0 0.2 0.9 0.1 1.0 0.2 0.9 0.3 0.5 0.8 0.2 0.3 0.6 0.9 0.2 0.3 0.6 0.9 0.2 0.3 0.6 0.9 0.1 1.0 0.2 0.9 0.2 1.0 0.4 0.6 0.2 1.0 0.4 0.6 0.2 1.0 0.4 0.6 11/30/2018

Max Weight Bipartite Matching Six Possible Orderings ONLY 3 ARE UNIQUE Reactions First Enzymes First Compounds First R First Pruning Weighted Edges Aligned Entities Inconsistent Edges Consistency Assured ! C1 E1 C1 R1 R1 E1 C2 C2 E2 R2 R2 E2 C3 C3 R3 R3 E3 C4 11/30/2018

Alignment Score ( SimP ) SimP =1 for identical pathways SimP = b Sim(C1) + Sim(C2) +Sim(C4) + (1 – b) Sim(E1) + Sim(E2) 3 2 C2 C4 R1 R2 C1 C5 C2 C3 C4 C5 R1 R2 C1 E1 E2 E1 E2 11/30/2018

Outline Graph Model of Pathways Consistency of an Alignment Homological & Topological Similarities Eigenvalue Problem Similarity Score Experimental Results 11/30/2018

Impact of Alpha = 0 : Only pairwise similarities of entities - No iterations = 1 : Only topology of the graphs a = 0.7 is good ! 11/30/2018

Alternative Entities & Paths Kim J. et al. (2007) Kuzuyama T. et al. (2006) Eukaryotes (e.g. H.Sapiens)  Mevalonate Path Bacterias (e.g. E.Coli)  Non-Mevalonate Path 11/30/2018

Phylogeny Prediction Archaea Our NCBI Prediction Taxonomy Eukaryota Thermoprotei Our Prediction NCBI Taxonomy Deuterostomia 11/30/2018

Effect Of Consistency Restriction 11/30/2018

Running Time 11/30/2018

How to allow subnetwork mapping 11/30/2018

Different Paths, Same Function 11/30/2018

How bad can it be? Lysine Biosynthesis E.coli A.thaliana MetNetAligner: Cheng & Zelikovsky, Bioinformatics 2009. SubMAP: Ay & Kahveci, RECOMB 2010. 11/30/2018

Alternative paths -1 11/30/2018

Alternative paths - 2 Work on this 11/30/2018

Alternative paths - 3 Work on this 11/30/2018

Mappings among major clades 11/30/2018

Dynamic programming approach 11/30/2018