Presentation is loading. Please wait.

Presentation is loading. Please wait.

Inferring strengths of protein-protein interactions from experimental data using linear programming Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics.

Similar presentations


Presentation on theme: "Inferring strengths of protein-protein interactions from experimental data using linear programming Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics."— Presentation transcript:

1 Inferring strengths of protein-protein interactions from experimental data using linear programming Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

2 Overview Background Probabilistic model Related work Biological experimental data Proposed methods For binary data For numerical data Results of computational experiments Conclusion

3 Background (1/3) Understanding protein-protein interactions is useful for understanding of protein functions. Transcription factors Proteins interact with a factor. Regulate the gene. Receptors, etc.

4 Background (2/3) Various methods were developed for inference of protein-protein interactions Gene fusion/Rosetta stone (Enright et al. and Marcotte et al. 1999) Number of possible genes to be applied is limited. Molecular dynamics Long CPU time Difficult to predict precisely

5 Background (3/3) A Model based on domain-domain interactions has been proposed. Use domains defined by databases like InterPro or Pfam. Domain

6 Overview Background Probabilistic model Related work Biological experimental data Proposed methods For binary data For numerical data Results of computational experiments Conclusion

7 Probabilistic model of interaction (1/2) Model (Deng et al., 2002) Two proteins interact. At least one pair of domains interacts. Interactions between domains are independent events. D1D1 D2D2 D3D3 D2D2 D4D4 P2P2 P1P1

8 : Proteins P i and P j interact : Domains D m and D n interact : Domain pair (D m,D n ) is included in protein pair P i X P j Probabilistic model of interaction (2/2)

9 Overview Background Probabilistic model Related work Association method (Sprinzak et al., 2001) EM method (Deng et al., 2002) Biological experimental data Proposed methods Results of computational experiments Conclusion

10 Related work INPUT: interacting protein pairs (positive examples) non-interacting protein pairs (negative examples) OUTPUT: Pr(D mn =1) for all domain pairs

11 Association method (Sprinzak et al., 2001) Inference of probabilities of domain- domain interactions using ratios of frequencies : Number of interacting protein pairs that include (D m, D n ) : Number of protein pairs that include (D m, D n )

12 EM method (Deng et al.,2002) Probability (likelihood L ) that experimental data {O ij ={0,1} } are observed. Use EM algorithm in order to (locally) maximize L. Estimate Pr(D mn =1)

13 Overview Background Probabilistic model Related work Biological experimental data Proposed methods For binary data For numerical data Results of computational experiments Conclusion

14 Biological experimental data Related methods (Association and EM) use only binary data (interact or not). Experimental data using Yeast 2 hybrid Ito et al. (2000, 2001) Uetz et al. (2001) For many protein pairs, different results ( O ij = {0,1} ) were observed. We developed new methods using raw numerical data.

15 Numerical data Ito et al. (2000,2001) For each protein pair, experiments were performed multiple times. IST (Interaction Sequence Tag) Number of observed interactions By using a threshold, we obtain binary data.

16 Overview Background Probabilistic model Related work Biological experimental data Proposed methods For binary data For numerical data Results of computational experiments Conclusion

17 Proposed methods It seems difficult to modify EM method for numerical data. Linear Programming For binary data LPBN Combined methods LPEM EMLP SVM-based method For numerical data ASNM LPNM

18 Overview Background Probabilistic model Related work Biological experimental data Proposed methods For binary data For numerical data Results of computational experiments Conclusion

19 LPBN (LP-based method)(1/2) Transformation into linear inequalities P i and P j interact

20 LPBN (LP-based method)(2/2) Linear programming for inference of protein-protein interactions

21 Combination of EM and LPBN LPEM method Use the results of LPBN as initial parameter values for EM. EMLP method Constrains to LPBN with the following inequalities so that LP solutions are close to EM solutions.

22 Simple SVM-based method Feature vector Simple linear kernel with Interacting pairs = Positive examples Non-interacting pairs = Negative examples

23 Overview Background Probabilistic model Related work Biological experimental data Proposed methods For binary data For numerical data Results of computational experiments Conclusion

24 Strength of protein-protein interaction For each protein pair, experiments were performed multiple times. The ratio can be considered as strength. K ij : Number of observed interactions for a protein pair (P i, P j ) M ij : Number of experiments for (P i, P j )

25 LPNM method (1/2) Minimize the gap between Pr(P ij =1) and using LP.

26 LPNM method (2/2) Linear programming for inference of strengths of protein-protein interactions

27 ASNM Modified Association method for numerical data For binary data (Sprinzak et al., 2001)

28 Overview Background Probabilistic model Related work Biological experimental data Proposed methods For binary data For numerical data Results of computational experiments Conclusion

29 Computational experiments for binary data DIP database (Xenarios et al., 2002) 1767 protein pairs as positive 2/3 of the pairs for training, 1/3 for test Computational environment Xeon processor 2.8 GHz LP solver: loqo

30 Results on training data (binary data) SVM EM LPBN Association

31 Results on test data (binary data) SVM EM EMLP Association LPEM

32 Computational experiments for numerical data YIP database (Ito et al., 2001, 2002) IST (Interaction Sequence Tag) 1586 protein pairs 4/5 for training, 1/5 for test Computational environment Xeon processor 2.8 GHz LP solver: lp_solve

33 Results on test data (numerical data) ASNM EM LPNM Association

34 Results on test data (numerical data) LPNM is the best. EM and Association methods classify Pr(P ij =1) into either 0 or 1. LPNM ASNM EMASSOC Ave. Error 0.03080.0405 0.295 0.277 CPU (sec.) 1.200.0077 1.620.0088

35 Conclusion We have defined a new problem to infer strengths of protein-protein interactions. We have proposed LP-based methods. For binary data LPBN, LPEM, EMLP SVM-based method For numerical data ASNM LPNM LPNM outperformed the other methods.

36 Future work Improve the methods to avoid overfitting. Improve the probabilistic model to understand protein-protein interactions more accurately.


Download ppt "Inferring strengths of protein-protein interactions from experimental data using linear programming Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics."

Similar presentations


Ads by Google