Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.

Slides:



Advertisements
Similar presentations
D ISCOVERING REGULATORY AND SIGNALLING CIRCUITS IN MOLECULAR INTERACTION NETWORK Ideker Bioinformatics 2002 Presented by: Omrit Zemach April Seminar.
Advertisements

VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism.
Gapped Blast and PSI BLAST Basic Local Alignment Search Tool ~Sean Boyle Basic Local Alignment Search Tool ~Sean Boyle.
A hub-attachment based method to detect functional modules from confidence-scored protein interactions and expression profiles Authors: Chia-Hao Chin 1,4,
Seeing the forest for the trees : using the Gene Ontology to restructure hierarchical clustering Dikla Dotan-Cohen, Simon Kasif and Avraham A. Melkman.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Heuristic alignment algorithms and cost matrices
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
6. Lecture SS 20005Cell Simulations1 V6: the “interactome” - Protein-protein interaction data is noisy and incomplete - V5: use Bayesian networks to combine.
Structure discovery in PPI networks using pattern-based network decomposition Philip Bachman and Ying Liu BIOINFORMATICS System biology Vol.25 no
Implicit Hitting Set Problems Richard M. Karp Harvard University August 29, 2011.
Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break 14:45 – 15:15Regulatory pathways lecture 15:15 – 15:45Exercise.
Large-Scale Global Alignments Multiple Alignments Lecture 10, Thursday May 1, 2003.
Similar Sequence Similar Function Charles Yan Spring 2006.
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
“Multiple indexes and multiple alignments” Presenting:Siddharth Jonathan Scribing:Susan Tang DFLW:Neda Nategh Upcoming: 10/24:“Evolution of Multidomain.
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
Networks of Protein Interactions Network Alignment Antal Novak CS 374 Lecture 6 10/13/2005 Nuke: Scalable and General Pairwise and Multiple Network Alignment.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of Functional Modules in Protein Interaction Networks Zina Mohamed.
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Large-scale organization of metabolic networks Jeong et al. CS 466 Saurabh Sinha.
BIONFORMATIC ALGORITHMS Ryan Tinsley Brandon Lile May 9th, 2014.
Gene expression & Clustering (Chapter 10)
MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.
Efficient Gathering of Correlated Data in Sensor Networks
Gapped BLAST and PSI- BLAST: a new generation of protein database search programs By Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui.
QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
Introduction to Bioinformatics Biological Networks Department of Computing Imperial College London March 18, 2010 Lecture hour 18 Nataša Pržulj
Construction of Substitution Matrices
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Multiple alignment: Feng- Doolittle algorithm. Why multiple alignments? Alignment of more than two sequences Usually gives better information about conserved.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame Fall 2010.
CSCE555 Bioinformatics Lecture 18 Network Biology: Comparison of Networks Across Species Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu.
PPI team Progress Report PPI team, IDB Lab. Sangwon Yoo, Hoyoung Jeong, Taewhi Lee Mar 2006.
Pairwise Local Alignment and Database Search Csc 487/687 Computing for Bioinformatics.
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Implicit Hitting Set Problems Richard M. Karp Erick Moreno Centeno DIMACS 20 th Anniversary.
Sequence Alignment.
Construction of Substitution matrices
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Biological Networks. Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002.
Predicting Protein Function Annotation using Protein- Protein Interaction Networks By Tamar Eldad Advisor: Dr. Yanay Ofran Computational Biology.
1 MAVID: Constrained Ancestral Alignment of Multiple Sequence Author: Nicholas Bray and Lior Pachter.
Algorithms For Solving History Sensitive Cascade in Diffusion Networks Research Proposal Georgi Smilyanov, Maksim Tsikhanovich Advisor Dr Yu Zhang Trinity.
Step 3: Tools Database Searching
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
1 Microarray Clustering. 2 Outline Microarrays Hierarchical Clustering K-Means Clustering Corrupted Cliques Problem CAST Clustering Algorithm.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Network applications Sushmita Roy BMI/CS 576 Dec 9 th, 2014.
Clustering [Idea only, Chapter 10.1, 10.2, 10.4].
Comparative Network Analysis BMI/CS 776 Spring 2013 Colin Dewey
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
CSCI2950-C Lecture 12 Networks
Spectral methods for Global Network Alignment
CSCI2950-C Genomes, Networks, and Cancer
Multiple sequence alignment (msa)
CSCI2950-C Lecture 13 Network Motifs; Network Integration
Spectral methods for Global Network Alignment
SEG5010 Presentation Zhou Lanjun.
Anastasia Baryshnikova  Cell Systems 
Bioinformatics, Vol.17 Suppl.1 (ISMB 2001) Weekly Lab. Seminar
Presentation transcript:

Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo

In the beginning there was DNA… Liolios K, Tavernarakis N, Hugenholtz P, Kyrpides, NC. The Genomes On Line Database (GOLD) v.2: a monitor of genome projects worldwide. NAR 34, D NAR 34, D

…then came protein interactions Arabidopsis PPI network E. Coli PPI network Yeast PPI network

Comparative Genomics to Comparative Interactomics Evolutionary conservation implies functional relevance  Sequence conservation implies functional conservation  Network conservation implies functional conservation too! What new insights might we gain from network comparisons? (Why should we care?)

Network comparisons allow us to: Identify conserved functional modules Query for a module, ala BLAST Predict functions of a module Predict protein functions Validate protein interactions Predict protein interactions Only possible with network comparisons Possible with existing techniques, but improved with network comparisons

What is a Protein Interaction Network? Proteins are nodes Interactions are edges Edges may have weights Yeast PPI network H. Jeong et al. Lethality and centrality in protein networks. Nature 411, 41 (2001)

The Network Alignment Problem Given k different protein interaction networks belonging to different species, we wish to find conserved sub-networks within these networks Conserved in terms of protein sequence similarity (node similarity) and interaction similarity (network topology similarity)

Example Network Alignment Sharan and Ideker. Modeling cellular machinery through biological network comparison. Nature Biotechnology 24, pp , 2006

General Framework For Network Alignment Algorithms Sharan and Ideker. Modeling cellular machinery through biological network comparison. Nature Biotechnology 24, pp , 2006 Network construction Scoring function Alignment algorithm Covered in lecture on network integration

Two Algorithms Discussed Today NetworkBLAST Sharan et al. Conserved patterns of protein interaction in multiple species. PNAS, 102(6): , Græmlin Flannick et al. Græmlin: General and robust alignment of multiple large interaction networks. Genome Res 16: , 2006.

Overview of Sharan et al. Conserved patterns of protein interaction in multiple species. PNAS, 102(6): , 2005.

Estimation of Interaction Probabilities In the preprocessing step, edges in the network are given a reliability score using a logistic regression model based on three features: 1. Number of times an interaction was observed 2. Pearson correlation coefficient between expression profiles 3. Proteins’ small world clustering coefficient

Network Alignment Graphs Construct a Network Alignment Graph to represent the alignment Nodes contain groups of sequence similar proteins from the k organisms Edges represent conserved interactions. An edge between two nodes is present if: 1. One pair of proteins directly interacts, the rest are distance at most 2 away 2. All protein pairs are of distance exactly 2 3. At least max(2, k – 1) protein pairs directly interact Tries to account for interaction deletions

Example Network Alignment Graph Nodes a b c a’ b’ c’ a’’ b’’ c’’ a b c a’ b’ c’ a’’ b’’ c’’ Network alignment graph Individual species’ PPI network Species XSpecies YSpecies Z

Scoring Function Sharan et al. devise a scoring scheme based on a likelihood model for the fit of a single sub-network to the given structure High scoring subgraphs correspond to structured sub-networks (cliques or pathways) Only network topology is scored, node similarity is not

Log Likelihood Ratio Model Measures the likelihood that a subgraph occurs if it is a conserved network vs. that if it were a randomly constructed network Randomly constructed network preserves degree distribution for nodes log Pr(Subgraph occurs | Conserved Network) Pr(Subgraph occurs | Random Network)

Likelihood Ratio Scoring of a Protein Complex in a Single Species U : a subset of vertices (proteins) in the PPI graph O U : collection of all observations on vertex pairs in U O uv : interaction between proteins u, v observed M s : conserved network model M n : random network (null) model T uv : proteins u, v interact F uv : proteins u, v do not interact β : probability that proteins u, v interact in conserved model p uv : probability that edge u, v exists in a random model Probability of complex being observed in a conserved network model Probability of subgraph being observed in a random network model

Likelihood Ratio Scoring of a Protein Complex in a Single Species Hence, log likelihood for a complex occurring in a single species is given by For multiple complexes across different species, it is the sum of the log likelihoods L(A, B, C) = L(A) + L(B) + L(C)

Example of Complex Scoring Nodes a b c a’ b’ c’ a’’ b’’ c’’ a b c a’ b’ c’ a’’ b’’ c’’ Conserved complex A in the Network alignment graph Individual species’ PPI network L(A) = L(X1) + L(Y1) + L (Z1) Complex X1 in Species X Complex Y1 in Species Y Complex Z1 in Species Z

Alignment algorithm Problem of identifying conserved sub- networks reduces to finding high scoring subgraphs NP-complete problem Heuristic solution:  Greedy extension of high scoring seeds  (Does this sound familiar? BLAST?)  Common to both papers discussed

Alignment algorithm 1. Find seeds for each node v in the alignment graph a. Find high scoring paths of 4 nodes by exhaustive search b. Greedily add 3 other nodes one by one, that maximally increase the score of the seed

Alignment algorithm 2. Iteratively add or remove nodes to increase the overall score of the node Original seeds are preserved Limit size of discovered subgraphs to 15 nodes Record up to 4 highest scoring subgraphs discovered around each node

Alignment algorithm 3. Filter subgraphs with a high degree of overlap Iteratively find high scoring subgraph and remove all highly overlapping ones remaining

Results Conserved network regions within yeast (orange ovals), fly (green rectangles) and worm (blue hexagons) PPI networks.

Results Prediction of protein function ‘Guilt by association’ If a conserved cluster or path is significantly enriched in a functional annotation Prediction of protein interactions Predictions based on 2 strategies: Evidence that proteins with similar sequences interact Co-occurrence of proteins in the same conserved cluster or path Experimental verification of Yeast interactions using Y2H yielded 40-62% success rate

Overview of Fast, scalable, network alignment  Scales linearly in number of networks compared  NetworkBLAST scales exponentially Supports efficient querying of modules Speed-sensitivity control via user defined parameter  Not supported in NetworkBLAST

Input to the Algorithm Weighted protein interaction graphs  Weights represent probability that proteins interact  Constructed via network integration algorithm covered in a later lecture A phylogenetic tree relating the species in the desired alignment  Used for progressive alignment

Definition of an alignment A set of subgraphs chosen from the interaction networks of different species, together with a mapping between aligned proteins Aligned proteins form equivalence classes  Each class was derived from a common ancestral protein  Can contain multiple proteins from the same species aa’ a’’b’’ Equivalence class showing paralogs

Scoring Function Log likelihood ratio model based on  Alignment model M: modules are subject to evolutionary constraint  Random model R: modules are not subject to any constraints Scores equivalence classes and alignment edges separately

Log Likelihood Ratio Model (Recap) Measures the likelihood that a module occurs if it is subject to evolutionary constraint vs. that if it were a randomly constructed network Randomly constructed network preserves degree distribution for nodes log Pr(Module occurs | Alignment Model M) Pr(Module occurs | Random Model R)

Scoring Equivalence Classes Reconstruct most parsimonious ancestral history of an equivalence class using Dynamic Programming based on five types of evolutionary events Alignment model and random model give probabilities for each of these events, combined to give a log likelihood score

Scoring Alignment Edges Alignment scores should reflect both network conservation and high connectivity – difficult to strike a balance Introduction of a novel scoring approach  Edge Scoring Matrix – Indexed by labels  Algorithm assigns a label to each equivalence class, scores according to distribution function in cells referenced by labels

Scoring: ESM

Alignment Algorithm: d-Clusters for Seed Generation A d-cluster consists of d proteins close together in a network “Close” means edge weights are high, so interaction is highly likely Intuition is that high scoring alignments will have high scoring d- clusters

Alignment Algorithm: d-Clusters for Seed Generation Identify pairs of d-clusters that score higher than a threshold T  Score is defined by greedily matching nodes from each d- cluster to obtain a high score Uses these pairs as seeds Allows for speed-sensitivity tradeoff

Alignment Algorithm: Generating An Initial Alignment From The Seed Determine highest scoring pair of nodes (one from each d-cluster) when aligned Align these nodes and place these nodes as well as their neighbors, into a frontier

Alignment Algorithm: Greedy Seed Extension Phase Examine all pairs of nodes in frontier for pair that maximally increases score when added to alignment Stops when no pair can further increase the score Remove equivalence classes if it can further increase the score Frontier Current alignment

Alignment Algorithm: Multiple Alignment Progressive alignment technique using the phylogenetic tree  Successively aligns closest pair of networks  Places each aligned network at the parent node of the two aligned species  Linear scaling in number of species

Performance Comparison: Speed-sensitivity / Linear Scaling

Performance Comparison: Multiple Alignment

Performance Comparison: Module Querying

Results Functional module identification using network alignment Functional module for transformation?

Results Functional annotation using network alignment Pairwise alignment Multiple alignment of 9 networks Conserved DNA replication module

Results Multiple alignment of 10 networks showing possible cell division module Functional annotation using network alignment

The Future of Network Comparison Græmlin Græmlin? Sharan and Ideker. Modeling cellular machinery through biological network comparison. Nature Biotechnology 24, pp , 2006

That’s all folks! Thank you! Questions?

Performance Comparison: Sensitivity

Scoring Sequence Mutations Weighted sum of pairs scoring