Presentation is loading. Please wait.

Presentation is loading. Please wait.

FunCoup: reconstructing protein networks in the worm and other animals Andrey Alexeyenko, Erik Sonnhammer Stockholm Bioinformatics Center.

Similar presentations


Presentation on theme: "FunCoup: reconstructing protein networks in the worm and other animals Andrey Alexeyenko, Erik Sonnhammer Stockholm Bioinformatics Center."— Presentation transcript:

1 FunCoup: reconstructing protein networks in the worm and other animals Andrey Alexeyenko, Erik Sonnhammer Stockholm Bioinformatics Center

2 C. elegans computed interactomes

3 FunCoup is a data integration framework to discover functional coupling in eukaryotic proteomes with data from model organisms A worm B worm ? Find orthologs* Mouse Human Fly Yeast High-throughput evidence

4 FunCoup Each piece of data is evaluated Data FROM many eukaryotes (7) Practical maximum of data sources (>60) Predicted networks FOR a number of eukaryotes (8) Organism-specific efficient and robust Bayesian frameworks Orthology-based information transfer and phylogenetic profiling Networks predicted for different types of functional coupling (metabolic, signaling etc.)

5 C. elegans’ benefit from the model species data integration: Li&Vidal’s set 5535 pairs IntAct (Oct. 2007) 4517 pairs 6841 Other C. elegans data 36000 predicted C.elegans pairs

6 Species: H. sapiens M. musculus R. norvegicus D. melanogaster C. elegans S. cerevisiae A. thaliana Data sources in FunCoup: Types: Protein-protein interactions Protein domain associations Protein-DNA interactions mRNA expression Protein expression miRNA targeting Sub-cellular co-localization Phylogenetic profiling

7 Multilateral data transfer Human Ciona Worm Mouse Rat Fly Yeast Arabidopsis FunCoup Data from the same species is an important but not indispensable component of the framework. Hence, a network can be constructed for an organism with no experimental datasets at all.

8 InParanoid P r o t e o m e A P r o t e o m e B Automatic clustering of orthologs and in-paralogs from pairwise species comparisons Maido Remm, Christian E. V. Storm and Erik L. L. Sonnhammer Journal of Molecular Biology 314, 5 Journal of Molecular Biology 314, 5, 14 December 2001, Pages 1041-1052 Reciprocally best hits ~ seed orthologs Inparalogs

9 How orthology works? Log overlap between KEGG pathways and complexes (Gavin et al., 2006)

10 Comparing networks Rat Human Mouse

11 Conclusions FunCoup:  is a flexible, exhaustive, and robust framework to infer confident functional links  enables practical web access to candidate interactions in both small and global-scale network context  is open towards better data quality and coverage http://FunCoup.sbc.su.se

12 Acknowledgements: Carsten Daub Kristoffer Forslund Anna Henricson Olof Karlberg Martin Klammer Mats Lindskog Kevin O’Brien Tomas Ohlson Sanjit Rupra Gabriel Östlund Sean Hooper All previous interaction network developers

13 Talk outline  Other network resources  Why FunCoup  Orthology and InParanoid  Implementation  Applications and future development

14 FunCoup is a naïve Bayesian network (NBN) Bayesian inference: Genes A and B are functionally coupled Genes A and B co- expressed P(C|E) = (P(C) * P(E|C)) / P(E) A B

15 Problem:Solution: Treat ALL inparalogs equally, and choose the BEST value In situatons with multiple inparalogs, how to deal with alternative evidence?

16 Problem:Solution: Naïve Bayesian network. Calculate a belief change instead (likelihood ratios, LR). Assume NO data dependency Absolute probabilities of FC are intractable. The full Bayesian network is impossible A B P(B|C), P(C|B) P(B|A), P(A|B) P(B|D), P(D|B) P(A|C ), P(C|A ) P(D|C), P(C|D) P(A|D ), P(D|A ) P(E|+) / P(E|-) A B P(E|+) / P(E|-)

17 gene evolution functional link Problem:Solution: Via groups of orthologs that emerged from speciation How to establish optimal bridges between species?

18 Homologs P r o t e o m e A P r o t e o m e B Homologs: proteins with similar sequence and, thus, common origin

19 An InParanoid cluster of orthologs Inparalogs

20 Problem:Solution: Enforce confidence check and remove insignificant nodes Some LR are weak and arise due to non-representative sampling P(E|+) / P(E|-) A B P(E|+) / P(E|-) χ 2 - test

21 Reciprocally best hits P r o t e o m e A P r o t e o m e B

22 Problem:Solution: Multinet Decide which types of FC are needed (provide as positive training sets) and perform the previous steps customized Definitions and notions of FC vary A <> B P(E|+) / P(E|-) A| BA| B A <> B A || B A|BA|B

23 Proteins of the Parkinson’s disease pathway (KEGG #05020) Physical protein-protein interaction “Signaling” link Metabolic “non-signaling” link Multinet presents several link types in parallel

24 The limits of data integration

25 FunCoup’s web interface Hooper S., Bork P. Medusa: a simple tool for interaction graph analysis. Bioinformatics. 2005 Dec 15;21(24):4432-3. Epub 2005 Sep 27. http://FunCoup.sbc.su.se

26 Reconctructing the “regulatory blueprint”* in C. intestinalis *Imai KS, Levine M, Satoh N, Satou Y (2006) Regulatory blueprint for a chordate embryo. Science, 26:1183-7. Proteins of the “Regulatory Blueprint for a Chordate Embryo” [ * ] 18 links mentioned in [ * ] AND found by FunCoup Links found by FunCoup (about 140) The rest, 202 links from [*] that FunCoup did not find, not shown

27 Orthologs Functional link Inparalogs C. elegans D. melanogaster human S cerevisiae Overview and comparison of ortholog databases Alexeyenko A, Lindberg J, Pérez-Bercoff Å, Sonnhammer ELL Drug Discovery Today:Technologies (2006) v. 3; 2, 137-143

28 Problem: Solution: Find them individually for each data set and FC class, accounting for the joint “feature – class” distribution Distribution areas informative of FC may vary 01Pearson r + + + + + + + +++ +++ +++ ++ + ++ - - - ----- -- ------ - - -- - - -

29 Validation Jack-knife procedure:  Take “positive” and “negative” sets  Split each randomly as 50:50  Use the first parts to train the algorithm, the second to test the performance  Repeat a number of times Analysis Of VAriance:  Introduce features A, B, C in the workflow of FunCoup (e.g., using PCA, selecting nodes of BN by relevance, ways of using ortholog data etc.)  Run FunCoup with all possible combinations of absence/presence of A, B, C to produce a balanced and orthogonal ANOVA design with replicates  Study effects of A,B,C or their combinations AxB, BxC,.. AxBxC to see if they influence the performance significantly (whereas all other effects did not exist)

30


Download ppt "FunCoup: reconstructing protein networks in the worm and other animals Andrey Alexeyenko, Erik Sonnhammer Stockholm Bioinformatics Center."

Similar presentations


Ads by Google