Presentation is loading. Please wait.

Presentation is loading. Please wait.

Graph preprocessing. Common Neighborhood Similarity (CNS) measures.

Similar presentations


Presentation on theme: "Graph preprocessing. Common Neighborhood Similarity (CNS) measures."— Presentation transcript:

1 Graph preprocessing

2 Common Neighborhood Similarity (CNS) measures

3 Jaccard similarity

4 Pvalue

5 Functional Similarity (FS)

6 Topological Overlap Measure (TOM)

7 Pair-wise H-Confidence Measure of the affinity of two items in terms of the transactions in which they appear simultaneously [Xiong et al, 2006] For an interaction network represented as an adjacency matrix: – Unweighted Networks: n 1,n 2 =# neighbors of p 1,p 2 m=# shared neighbors of p 1,p 2 – Weighted Networks: n 1,n 2 =sum(weights) of edges incident on p 1,p 2 m = sum of min(weights) of edges to common neighbors of p 1,p 2

8 H-confidence Example p1p1 p2p2 p3p3 p4p4 p5p5 p1p1 00101 p2p2 00110 p3p3 11001 p4p4 11001 p5p5 10110 p1p1 p2p2 p3p3 p4p4 p5p5 p1p1 000.500.1 p2p2 0010.20 p3p3 0.51000.1 p4p4 00.2000.5 p5p5 0.10 0.50 Unweighted NetworkWeighted Network Hconf(p 1,p 2 )= min(0.5,0.5) = 0.5 Hconf(p 1,p 2 )= min(0.5/0.6,0.5/1.2) = 0.416

9 Validation of Final Network Use FunctionalFlow algorithm [Nabieva et al, 2005] on the original and transformed graph(s) – One of the most accurate algorithms for predicting function from interaction networks – Produces likelihood scores for each protein being annotated with one of 75 MIPS functional labels Likelihood matrix evaluated using two metrics – Multi-label versions of precision and recall: – m i = # predictions made, n i = # known annotations, k i = # correct predictions – Precision/accuracy of top-k predictions Useful for actual biological experimental scenarios

10 Test Protein Interaction Networks Three yeast interaction networks with different types of weighting schemes used for experiments – Combined Composed from Ito, Uetz and Gavin (2002)’s data sets Individual reliabilities obtained from EPR index tool of DIP Overall reliabilities obtained using a noisy-OR – [Krogan et al, 2006]’s data set 6180 interactions between 2291 annotated proteins Edge reliabilities derived using machine learning techniques – DIPCore [Deane et al, 2002] ~5K highly reliable interactions in DIP No weights assigned: assumed unweighted

11 Results on Combined data set Precision-Recall Accuracy of top-k predictions

12 Results on Krogan et al’s data set Precision-Recall Accuracy of top-k predictions

13 Results on DIPCore Precision-Recall Accuracy of top-k predictions

14 Noise removal capabilities of H-confidence H-confidence and hypercliques have been shown to have noise removal capabilities [Xiong et al, 2006] To test its effectiveness, we added 50% random edges to DIPCore, and re-ran the transformation process Fall in performance of transformed network is significantly smaller than that in the original network

15 Summary of Results H-confidence-based transformations generally produce more accurate and more reliably weighted interaction graphs: Validated function prediction Generally, the less reliable the weights assigned to the edges in the raw network, the greater improvement in performance obtained by using an h-confidence-based graph transformation. Better performance of the h-confidence-based graph transformation method is indeed due to the removal of spurious edges, and potentially the addition of biologically viable ones and effective weighting of the resultant set of edges.

16 Conclusions and future directions

17

18 References (I) [Pandey et al, 2006] Gaurav Pandey, Vipin Kumar and Michael Steinbach, Computational Approaches for Protein Function Prediction: A Survey, TR 06-028, Department of Computer Science and Engineering, University of Minnesota, Twin Cities [Pandey et al, 2007] G. Pandey, M. Steinbach, R. Gupta, T. Garg and V. Kumar, Association analysis-based transformations for protein interaction networks: a function prediction case study. KDD 2007: 540-549 [Xiong et al, 2005] XIONG, H., HE, X., DING, C., ZHANG, Y., KUMAR, V., AND HOLBROOK, S. R. 2005. Identification of functional modules in protein complexes via hyperclique pattern discovery. In Proc. Pacific Symposium on Biocomputing (PSB). 221–232. [Xiong et al, 2006a] XIONG, H., TAN, P.-N., AND KUMAR, V. 2003. Hyperclique Pattern Discovery, Data Mining and Knowledge Discovery, 13(2):219-242 [Xiong et al, 2006b] XIONG, H., PANDEY, G., STEINBACH, M., AND KUMAR, V. 2006, Enhancing Data Analysis with Noise Removal, IEEE TKDE, 18(3):304-319 [Xiong et al, 2006c] Hui Xiong, Michael Steinbach, and Vipin Kumar, Privacy Leakage in Multi-relational Databases: A Semi-supervised Learning Perspective, VLDB Journal Special Issue on Privacy Preserving Data Management, Vol. 15, No. 4, pp. 388-402, November, 2006 [Xiong et al, 2004] Hui Xiong, Michael Steinbach, Pang-Ning Tan and Vipin Kumar, HICAP: Hierarchical Clustering with Pattern Preservation, SIAM Data Mining 2004 [Tan et al, 2005] TAN, P.-N., STEINBACH, M., AND KUMAR, V. 2005. Introduction to Data Mining. Addison-Wesley. [Nabieva et al, 2005] NABIEVA, E., JIM, K., AGARWAL, A., CHAZELLE, B., AND SINGH, M. 2005. Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps. Bioinformatics 21, Suppl. 1, i1–i9. [Deng et al, 2003] DENG, M., SUN, F., AND CHEN, T. 2003. Assessment of the reliability of protein–protein interactions and protein function prediction. In Pac Symp Biocomputing. 140–151. [Gavin et al, 2002] A. Gavin et al. Functional organization of the yeast proteome by systematic analysis of protein complexes, Nature, 415:141-147, 2002 [Hart et al, 2006] G Traver Hart, Arun K Ramani and Edward M Marcotte, How complete are current yeast and human protein-interaction networks, Genome Biology, 7:120, 2006

19 References (II) [Brun et al, 2003] BRUN, C., CHEVENET, F.,MARTIN, D.,WOJCIK, J., GUENOCHE, A., AND JACQ, B. 2003. Functional classification of proteins for the prediction of cellular function from a protein-protein interaction network. Genome Biology 5, 1, R6 [Samanta et al, 2003] SAMANTA, M. P. AND LIANG, S. 2003. Predicting protein functions from redundancies in large-scale protein interaction networks. Proc Natl Acad Sci U.S.A. 100, 22, 12579–12583 [Salwinski et al, 2004] Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D (2004) The Database of Interacting Proteins: 2004 update. NAR 32 Database issue:D449-51, http://dip.doe-mbi.ucla.edu/ [Gavin et al, 2006] Gavin et al, 2006, Proteome survey reveals modularity of the yeast cell machinery, Nature 440, 631-636 [Deane et al, 2002] Deane CM, Salwinski L, Xenarios I, Eisenberg D (2002) Protein interactions: Two methods for assessment of the reliability of high-throughput observations. Mol Cell Prot 1:349-356


Download ppt "Graph preprocessing. Common Neighborhood Similarity (CNS) measures."

Similar presentations


Ads by Google