FunCoup: reconstructing protein networks in the worm and other animals Andrey Alexeyenko, Erik Sonnhammer Stockholm Bioinformatics Center.

Slides:



Advertisements
Similar presentations
STRING Prediction of protein networks through integration of diverse large-scale data sets Lars Juhl Jensen EMBL Heidelberg.
Advertisements

Detecting active subnetworks in molecular interaction networks with missing data Luke Hunter Texas A&M University SHURP 2007 Student.
MitoInteractome : Mitochondrial Protein Interactome Database Rohit Reja Korean Bioinformation Center, Daejeon, Korea.
Using phylogenetic profiles to predict protein function and localization As discussed by Catherine Grasso.
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
D ISCOVERING REGULATORY AND SIGNALLING CIRCUITS IN MOLECULAR INTERACTION NETWORK Ideker Bioinformatics 2002 Presented by: Omrit Zemach April Seminar.
Global Mapping of the Yeast Genetic Interaction Network Tong et. al, Science, Feb 2004 Presented by Bowen Cui.
The STRING database Michael Kuhn EMBL Heidelberg.
Design principle of biological networks—network motif.
Andrey Alexeyenko M edical E pidemiology and B iostatistics Network biology and cancer data integration.
Orthology Analysis Erik Sonnhammer Center for Genomics and Bioinformatics Karolinska Institutet, Stockholm.
FunCoup data integration and networks of functional coupling in eukaryotes Andrey Alexeyenko.
Expression signatures as biomarkers: solving combinatorial problems with gene networks Andrey Alexeyenko Department of Medical Epidemiology and Biostatistics,
STRING Modeling of biological systems through cross-species data integration.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
M ulti P aranoid Automatic Clustering of Orthologs and Inparalogs Shared by Multiple Proteomes Andrey Alexeyenko Ivica Tamas Gang Liu Erik L.L. Sonnhammer.
Cluster analysis of networks generated through homology: automatic identification of important protein communities involved in cancer metastasis Jonsson.
Benchmarking Orthology in Eukaryotes Nijmegen Tim Hulsen.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
T Monday, June 15, 2015Monday, June 15, 2015Monday, June 15, 2015Monday, June 15, 2015.
Properties of Interaction Networks Jason Turner-Maier May 1st.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break 14:45 – 15:15Regulatory pathways lecture 15:15 – 15:45Exercise.
Systems biology in cancer research. What is systems biology? = Molecular physiology? “… physiology is the science of the mechanical, physical, and biochemical.
Andrey Alexeyenko M edical E pidemiology and B iostatistics Gene network approach in epidemiology.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Data Mining Presentation Learning Patterns in the Dynamics of Biological Networks Chang hun You, Lawrence B. Holder, Diane J. Cook.
Biological networks Construction and Analysis. Recap Gene regulatory networks –Transcription Factors: special proteins that function as “keys” to the.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Protein Interactions and Disease Audry Kang 7/15/2013.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Protein Interaction Networks Aalt-Jan van Dijk Applied Bioinformatics, PRI, Wageningen UR & Mathematical and Statistical Methods, Biometris, Wageningen.
MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.
Beyond the Human Genome Project Future goals and projects based on findings from the HGP.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
Gene Regulatory Network Inference. Progress in Disease Treatment  Personalized medicine is becoming more prevalent for several kinds of cancer treatment.
Networks and Interactions Boo Virk v1.0.
HUMAN-MOUSE CONSERVED COEXPRESSION NETWORKS PREDICT CANDIDATE DISEASE GENES Ala U., Piro R., Grassi E., Damasco C., Silengo L., Brunner H., Provero P.
Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.
Network & Systems Modeling 29 June 2009 NCSU GO Workshop.
Abstract Background: In this work, a candidate gene prioritization method is described, and based on protein-protein interaction network (PPIN) analysis.
Part 1: Biological Networks 1.Protein-protein interaction networks 2.Regulatory networks 3.Expression networks 4.Metabolic networks 5.… more biological.
Cell Signaling Ontology Takako Takai-Igarashi and Toshihisa Takagi Human Genome Center, Institute of Medical Science, University of Tokyo.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Identification of Ortholog Groups by OrthoMCL Protein sequences from organisms of interest All-against-all BLASTP Between Species: Reciprocal best similarity.
Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.
CSCE555 Bioinformatics Lecture 18 Network Biology: Comparison of Networks Across Species Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu.
EB3233 Bioinformatics Introduction to Bioinformatics.
By: Amira Djebbari and John Quackenbush BMC Systems Biology 2008, 2: 57 Presented by: Garron Wright April 20, 2009 CSCE 582.
Functional prediction methods. The usual troubles of the molecular and cellular biology labs What are the functions of a previously non characterized.
Genome Biology and Biotechnology The next frontier: Systems biology Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity Institute.
Introduction to biological molecular networks
GO based data analysis Iowa State Workshop 11 June 2009.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
A Tutorial of the PrePPI Database Presenters: Gabriel Leis and Katrina Sherbina Loyola Marymount University Departments of Biology and Computer Science.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
PROTEIN INTERACTION NETWORK – INFERENCE TOOL DIVYA RAO CANDIDATE FOR MASTER OF SCIENCE IN BIOINFORMATICS ADVISOR: Dr. FILIPPO MENCZER CAPSTONE PROJECT.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Spectral Algorithms for Learning HMMs and Tree HMMs for Epigenetics Data Kevin C. Chen Rutgers University joint work with Jimin Song (Rutgers/Palentir),
PINALOG Protein Interaction Network Alignment and its implication in function prediction and complex detection Hang Phan Prof. Michael J.E. Sternberg.
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
Bioinformatics, Vol.17 Suppl.1 (ISMB 2001) Weekly Lab. Seminar
Label propagation algorithm
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
Presentation transcript:

FunCoup: reconstructing protein networks in the worm and other animals Andrey Alexeyenko, Erik Sonnhammer Stockholm Bioinformatics Center

C. elegans computed interactomes

FunCoup is a data integration framework to discover functional coupling in eukaryotic proteomes with data from model organisms A worm B worm ? Find orthologs* Mouse Human Fly Yeast High-throughput evidence

FunCoup Each piece of data is evaluated Data FROM many eukaryotes (7) Practical maximum of data sources (>60) Predicted networks FOR a number of eukaryotes (8) Organism-specific efficient and robust Bayesian frameworks Orthology-based information transfer and phylogenetic profiling Networks predicted for different types of functional coupling (metabolic, signaling etc.)

C. elegans’ benefit from the model species data integration: Li&Vidal’s set 5535 pairs IntAct (Oct. 2007) 4517 pairs 6841 Other C. elegans data predicted C.elegans pairs

Species: H. sapiens M. musculus R. norvegicus D. melanogaster C. elegans S. cerevisiae A. thaliana Data sources in FunCoup: Types: Protein-protein interactions Protein domain associations Protein-DNA interactions mRNA expression Protein expression miRNA targeting Sub-cellular co-localization Phylogenetic profiling

Multilateral data transfer Human Ciona Worm Mouse Rat Fly Yeast Arabidopsis FunCoup Data from the same species is an important but not indispensable component of the framework. Hence, a network can be constructed for an organism with no experimental datasets at all.

InParanoid P r o t e o m e A P r o t e o m e B Automatic clustering of orthologs and in-paralogs from pairwise species comparisons Maido Remm, Christian E. V. Storm and Erik L. L. Sonnhammer Journal of Molecular Biology 314, 5 Journal of Molecular Biology 314, 5, 14 December 2001, Pages Reciprocally best hits ~ seed orthologs Inparalogs

How orthology works? Log overlap between KEGG pathways and complexes (Gavin et al., 2006)

Comparing networks Rat Human Mouse

Conclusions FunCoup:  is a flexible, exhaustive, and robust framework to infer confident functional links  enables practical web access to candidate interactions in both small and global-scale network context  is open towards better data quality and coverage

Acknowledgements: Carsten Daub Kristoffer Forslund Anna Henricson Olof Karlberg Martin Klammer Mats Lindskog Kevin O’Brien Tomas Ohlson Sanjit Rupra Gabriel Östlund Sean Hooper All previous interaction network developers

Talk outline  Other network resources  Why FunCoup  Orthology and InParanoid  Implementation  Applications and future development

FunCoup is a naïve Bayesian network (NBN) Bayesian inference: Genes A and B are functionally coupled Genes A and B co- expressed P(C|E) = (P(C) * P(E|C)) / P(E) A B

Problem:Solution: Treat ALL inparalogs equally, and choose the BEST value In situatons with multiple inparalogs, how to deal with alternative evidence?

Problem:Solution: Naïve Bayesian network. Calculate a belief change instead (likelihood ratios, LR). Assume NO data dependency Absolute probabilities of FC are intractable. The full Bayesian network is impossible A B P(B|C), P(C|B) P(B|A), P(A|B) P(B|D), P(D|B) P(A|C ), P(C|A ) P(D|C), P(C|D) P(A|D ), P(D|A ) P(E|+) / P(E|-) A B P(E|+) / P(E|-)

gene evolution functional link Problem:Solution: Via groups of orthologs that emerged from speciation How to establish optimal bridges between species?

Homologs P r o t e o m e A P r o t e o m e B Homologs: proteins with similar sequence and, thus, common origin

An InParanoid cluster of orthologs Inparalogs

Problem:Solution: Enforce confidence check and remove insignificant nodes Some LR are weak and arise due to non-representative sampling P(E|+) / P(E|-) A B P(E|+) / P(E|-) χ 2 - test

Reciprocally best hits P r o t e o m e A P r o t e o m e B

Problem:Solution: Multinet Decide which types of FC are needed (provide as positive training sets) and perform the previous steps customized Definitions and notions of FC vary A <> B P(E|+) / P(E|-) A| BA| B A <> B A || B A|BA|B

Proteins of the Parkinson’s disease pathway (KEGG #05020) Physical protein-protein interaction “Signaling” link Metabolic “non-signaling” link Multinet presents several link types in parallel

The limits of data integration

FunCoup’s web interface Hooper S., Bork P. Medusa: a simple tool for interaction graph analysis. Bioinformatics Dec 15;21(24): Epub 2005 Sep 27.

Reconctructing the “regulatory blueprint”* in C. intestinalis *Imai KS, Levine M, Satoh N, Satou Y (2006) Regulatory blueprint for a chordate embryo. Science, 26: Proteins of the “Regulatory Blueprint for a Chordate Embryo” [ * ] 18 links mentioned in [ * ] AND found by FunCoup Links found by FunCoup (about 140) The rest, 202 links from [*] that FunCoup did not find, not shown

Orthologs Functional link Inparalogs C. elegans D. melanogaster human S cerevisiae Overview and comparison of ortholog databases Alexeyenko A, Lindberg J, Pérez-Bercoff Å, Sonnhammer ELL Drug Discovery Today:Technologies (2006) v. 3; 2,

Problem: Solution: Find them individually for each data set and FC class, accounting for the joint “feature – class” distribution Distribution areas informative of FC may vary 01Pearson r

Validation Jack-knife procedure:  Take “positive” and “negative” sets  Split each randomly as 50:50  Use the first parts to train the algorithm, the second to test the performance  Repeat a number of times Analysis Of VAriance:  Introduce features A, B, C in the workflow of FunCoup (e.g., using PCA, selecting nodes of BN by relevance, ways of using ortholog data etc.)  Run FunCoup with all possible combinations of absence/presence of A, B, C to produce a balanced and orthogonal ANOVA design with replicates  Study effects of A,B,C or their combinations AxB, BxC,.. AxBxC to see if they influence the performance significantly (whereas all other effects did not exist)