FunCoup: reconstructing protein networks in the worm and other animals Andrey Alexeyenko, Erik Sonnhammer Stockholm Bioinformatics Center
C. elegans computed interactomes
FunCoup is a data integration framework to discover functional coupling in eukaryotic proteomes with data from model organisms A worm B worm ? Find orthologs* Mouse Human Fly Yeast High-throughput evidence
FunCoup Each piece of data is evaluated Data FROM many eukaryotes (7) Practical maximum of data sources (>60) Predicted networks FOR a number of eukaryotes (8) Organism-specific efficient and robust Bayesian frameworks Orthology-based information transfer and phylogenetic profiling Networks predicted for different types of functional coupling (metabolic, signaling etc.)
C. elegans’ benefit from the model species data integration: Li&Vidal’s set 5535 pairs IntAct (Oct. 2007) 4517 pairs 6841 Other C. elegans data predicted C.elegans pairs
Species: H. sapiens M. musculus R. norvegicus D. melanogaster C. elegans S. cerevisiae A. thaliana Data sources in FunCoup: Types: Protein-protein interactions Protein domain associations Protein-DNA interactions mRNA expression Protein expression miRNA targeting Sub-cellular co-localization Phylogenetic profiling
Multilateral data transfer Human Ciona Worm Mouse Rat Fly Yeast Arabidopsis FunCoup Data from the same species is an important but not indispensable component of the framework. Hence, a network can be constructed for an organism with no experimental datasets at all.
InParanoid P r o t e o m e A P r o t e o m e B Automatic clustering of orthologs and in-paralogs from pairwise species comparisons Maido Remm, Christian E. V. Storm and Erik L. L. Sonnhammer Journal of Molecular Biology 314, 5 Journal of Molecular Biology 314, 5, 14 December 2001, Pages Reciprocally best hits ~ seed orthologs Inparalogs
How orthology works? Log overlap between KEGG pathways and complexes (Gavin et al., 2006)
Comparing networks Rat Human Mouse
Conclusions FunCoup: is a flexible, exhaustive, and robust framework to infer confident functional links enables practical web access to candidate interactions in both small and global-scale network context is open towards better data quality and coverage
Acknowledgements: Carsten Daub Kristoffer Forslund Anna Henricson Olof Karlberg Martin Klammer Mats Lindskog Kevin O’Brien Tomas Ohlson Sanjit Rupra Gabriel Östlund Sean Hooper All previous interaction network developers
Talk outline Other network resources Why FunCoup Orthology and InParanoid Implementation Applications and future development
FunCoup is a naïve Bayesian network (NBN) Bayesian inference: Genes A and B are functionally coupled Genes A and B co- expressed P(C|E) = (P(C) * P(E|C)) / P(E) A B
Problem:Solution: Treat ALL inparalogs equally, and choose the BEST value In situatons with multiple inparalogs, how to deal with alternative evidence?
Problem:Solution: Naïve Bayesian network. Calculate a belief change instead (likelihood ratios, LR). Assume NO data dependency Absolute probabilities of FC are intractable. The full Bayesian network is impossible A B P(B|C), P(C|B) P(B|A), P(A|B) P(B|D), P(D|B) P(A|C ), P(C|A ) P(D|C), P(C|D) P(A|D ), P(D|A ) P(E|+) / P(E|-) A B P(E|+) / P(E|-)
gene evolution functional link Problem:Solution: Via groups of orthologs that emerged from speciation How to establish optimal bridges between species?
Homologs P r o t e o m e A P r o t e o m e B Homologs: proteins with similar sequence and, thus, common origin
An InParanoid cluster of orthologs Inparalogs
Problem:Solution: Enforce confidence check and remove insignificant nodes Some LR are weak and arise due to non-representative sampling P(E|+) / P(E|-) A B P(E|+) / P(E|-) χ 2 - test
Reciprocally best hits P r o t e o m e A P r o t e o m e B
Problem:Solution: Multinet Decide which types of FC are needed (provide as positive training sets) and perform the previous steps customized Definitions and notions of FC vary A <> B P(E|+) / P(E|-) A| BA| B A <> B A || B A|BA|B
Proteins of the Parkinson’s disease pathway (KEGG #05020) Physical protein-protein interaction “Signaling” link Metabolic “non-signaling” link Multinet presents several link types in parallel
The limits of data integration
FunCoup’s web interface Hooper S., Bork P. Medusa: a simple tool for interaction graph analysis. Bioinformatics Dec 15;21(24): Epub 2005 Sep 27.
Reconctructing the “regulatory blueprint”* in C. intestinalis *Imai KS, Levine M, Satoh N, Satou Y (2006) Regulatory blueprint for a chordate embryo. Science, 26: Proteins of the “Regulatory Blueprint for a Chordate Embryo” [ * ] 18 links mentioned in [ * ] AND found by FunCoup Links found by FunCoup (about 140) The rest, 202 links from [*] that FunCoup did not find, not shown
Orthologs Functional link Inparalogs C. elegans D. melanogaster human S cerevisiae Overview and comparison of ortholog databases Alexeyenko A, Lindberg J, Pérez-Bercoff Å, Sonnhammer ELL Drug Discovery Today:Technologies (2006) v. 3; 2,
Problem: Solution: Find them individually for each data set and FC class, accounting for the joint “feature – class” distribution Distribution areas informative of FC may vary 01Pearson r
Validation Jack-knife procedure: Take “positive” and “negative” sets Split each randomly as 50:50 Use the first parts to train the algorithm, the second to test the performance Repeat a number of times Analysis Of VAriance: Introduce features A, B, C in the workflow of FunCoup (e.g., using PCA, selecting nodes of BN by relevance, ways of using ortholog data etc.) Run FunCoup with all possible combinations of absence/presence of A, B, C to produce a balanced and orthogonal ANOVA design with replicates Study effects of A,B,C or their combinations AxB, BxC,.. AxBxC to see if they influence the performance significantly (whereas all other effects did not exist)