Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dept. of Computer Science Rutgers Node and Graph Similarity : Theory and Applications Danai Koutra (CMU) Tina Eliassi-Rad (Rutgers) Christos Faloutsos.

Similar presentations


Presentation on theme: "Dept. of Computer Science Rutgers Node and Graph Similarity : Theory and Applications Danai Koutra (CMU) Tina Eliassi-Rad (Rutgers) Christos Faloutsos."— Presentation transcript:

1 Dept. of Computer Science Rutgers Node and Graph Similarity : Theory and Applications Danai Koutra (CMU) Tina Eliassi-Rad (Rutgers) Christos Faloutsos (CMU) ICDM 2014, Monday December 15 th 2014, Shenzhen, China Copyright for the tutorial materials is held by the authors. The authors grant IEEE ICDM permission to distribute the materials through its website.

2 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Part 1b Node Similarity: Proximity 2

3 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos What to remember Node Roles and Proximity are complementary Node Proximity: –Building block for many applications! –Many (wrong) ways to define similarities. –Guilt-by-association techniques and effective conductance are similar (and recommended). 3

4 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Real-world Applications 4

5 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Movies recommendations 5

6 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Search Engines (IR) Topical Sessions 6 “popular music videos” Queries URLs “music” “yahoo” similar

7 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Node proximity measures: Information exchange [Minkov+ ’06] Latency/speed of info exchange [Bunke] Likelihood of future links [Liben-Nowell+ ‘03], [Tong+] Propagation of a product/idea/disease [Prakash+] Relevance: ranking [Haveliwala], [Chakrabarti+] 7

8 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Intuitively, a good measure… … of node-proximity should reward –many –short paths –heavy 8

9 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Roadmap Node Roles Node Proximity –Graph-theoretic Approaches –Effective Conductance –Guilt-by-association techniques –Summary 9

10 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Graph-theoretic Approaches Idea: Simple Metrics: –Number of hops –Sum of weights of hops 10 more similar than [Koren+ ’07]

11 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Graph-theoretic Approaches… … do not always capture meaningful relationships 11 linked via only one path (s,t) probably unrelated t s t s t s dist(s,t) = 2 [Koren+ ’07]

12 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Max-flow Min-cut Approach Idea: - assign a limited capacity to each edge - compute maximal # of units delivered from s to t 12 Same maximal flow Although red “closer” than green 1/1 1/2 1/1 1/2 s s t t [Koren+ ’07]

13 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos SimRank Idea: two objects are similar if they are referenced by similar objects 13 [Jeh, Widom ’07] Structural context G(V,E) G 2 (V 2,E 2 )

14 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos SimRank 14 Structural context G(V,E) α b similarity of in-neighbors decay factor ε[0,1] total # of in-neighbors pairs Avg similarity between in-neighbors of α and in-neighbors of b [Jeh, Widom ’07]

15 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos SimRank for Bipartite Graphs 15 G(V,E) Avg similarity between out-neighbors of A and out-neighbors of B c d Avg similarity between in-neighbors of c and in-neighbors of d A,BA,B c,dc,d [Jeh, Widom ’07; Improvements: Antonellis+‘08 SimRank++, C. Li, Han+’10, Y. Zhang ’13, P. Li+’14 … ] “music” “music videos”

16 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Roadmap Node Roles Node Proximity –Graph-theoretic Approaches –Effective Conductance –Guilt-by-association techniques –Summary 16

17 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Effective Conductance Edges: resistors Weights: conductance 17 [Doyle & Snell ‘84] (s,t)-proximity = current from s to t s t Vs = 1 Solve system of linear equations

18 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Effective Conductance Favors short paths (graph-theoretic) and more paths (max-flow) BUT: ‘pizza delivery boy’ issues 18 s t Vs = 1 [Doyle & Snell ‘84]

19 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Cycle-free effective conductance ~ random walks EC(s,t) = Effective Conductance(s,t) = = deg(s) * P(s -> t) = = deg(t) * P(t -> s) = = Expected number of successful escapes from s -> t for deg(s) attempts 19 s t Vs = 1 [Doyle & Snell ‘84]

20 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Community Detection Proximity Graph 20 RSA cryptosystem Proximity graph [Koren+ ’07]

21 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Community around a single node 21 [Koren+ ’07]

22 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Roadmap Node Roles Node Proximity –Graph-theoretic Approaches –Effective Conductance –Unification: Guilt-by-association techniques –Summary 22

23 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos ? ? ? ? ? Guilt-by-Association Techniques 23 Given: graph and few labeled nodes Find: class (red/green) for rest nodes Assuming: network effect (homophily/ heterophily) red green F raudster H onest A ccomplice red green red

24 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Guilt-by-Association Techniques Random Walk with Restarts (RWR) Google Semi-supervised Learning (SSL) Belief Propagation (BP) Bayesian 24 ? ? ? ? ?

25 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Personalized Random Walk with Restarts (RWR) 25 [Brin+ ’98; Haveliwala ’03; Tong+ ’06; Minkov, Cohen ’07] measure relevance

26 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Personalized RWR 26 relevance vector restart prob starting vector 0 ½ ½ ½ 0 ½ 0 1 0 0 ½ ½ ½ 0 ½ 0 1 0 1 ? ? 0 1 0 1 0 graph structure

27 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Guilt-by-Association Techniques Random Walk with Restarts (RWR) Google Semi-supervised Learning (SSL) Belief Propagation (BP) Bayesian 27 ? ? ? ? ?

28 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Semi-Supervised Learning (SSL) graph-based few labeled nodes edges: similarity between nodes Inference: exploit neighborhood information 28 STEP1STEP1 STEP1STEP1 STEP2STEP2 STEP2STEP2 0.8 -0.3 ? ? -0.1 0.6 0.8 [Zhou ’06; Ji, Han ’10]…

29 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos SSL Equation 29 homophily strength of neighbors ~”stiffness of spring” final labels known labels 1 ? ? 0 1 0 1 0 d1 d2 d3 d1 d2 d3 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 graph structure 0.8 -0.3 ? ?

30 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Guilt-by-Association Techniques Random Walk with Restarts (RWR) Google Semi-supervised Learning (SSL) Belief Propagation (BP) Bayesian 30 ? ? ? ? ?

31 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Belief Propagation Iterative message-based method 31 0.90.1 0.9 1 st round 2 nd round... until stop criterion fulfilled “Propagation matrix”:  Homophily PL AI class of sender class of receiver [Pearl ’82; Yedidia+ ’02; … ; Gonzalez+ ’09; Chechetka+ ‘10]

32 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Belief Propagation Iterative message-based method 32 0.90.1 0.9 0.30.7 0.90.1 “Propagation matrix”:  Homophily  Heterophily class of sender class of receiver [Pearl ’82; Yedidia+ ’02; … ; Gonzalez+ ’09; Chechetka+ ‘10]

33 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos 33 Belief Propagation Equations 0.90.1 0.20.8 i j … … message(i −> j) ≈ belief(i)  homophily strength

34 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos 34 Belief Propagation Equations i j … belief of i prior belief messages from neighbors

35 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Fast Belief Propagation 35 BP is approximated by Linearized BP 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 ? ? 0 -10 -2 10 -2 0 -10 -2 10 -2 1 d1 d2 d3 d1 d2 d3 linearnon-linear Belief Propagation FaBP [Koutra+]:Original [Yedidia+]: [Koutra+ PKDD’11: Unifying Guilt-by-Association Approaches: Theorems and Fast Algorithms] prior beliefs

36 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Qualitative Comparison 36 GBA Method HeterophilyScalabilityConvergence RWR ✗✓✓ SSL ✗✓✓ BP ✓✓ ? F A BP ✓✓✓ [Koutra+ PKDD’11]

37 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Correspondence of Methods RWR ≈ SSL ≈ BP Random Walk Semi-supervised Belief with Restarts Learning Propagation 37 MethodMatrixunknownknown RWR[I – c AD -1 ]×x=(1-c)y SSL [I + a (D - A)] ×x=y FABP [I + a D - c ’ A] ×bhbh =φhφh 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 ? ? 0 1 0 1 [Koutra+ PKDD’11]

38 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Extension to multiple classes 38 [Gatterbauer+ VLDB’15] MethodMatrixunknownknown RWR [I – c AD -1 ]×x=(1-c)y SSL [I + a (D - A) ] ×x=y FABP [I + a D - c ’ A ] ×bhbh =φhφh LinBP [I + H 2 D - H A ]×vec(F)=vec(X) 0.70.20.1 0.20.60.2 0.10.20.7 0.90.1 0.9

39 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Extension to multiple classes 39 [Gatterbauer+ VLDB’15] MethodMatrixunknownknown RWR [I – c AD -1 ]×x=(1-c)y SSL [I + a (D - A) ] ×x=y FABP [I + a D - c ’ A ] ×bhbh =φhφh LinBP [I + H 2 D - H A ]×vec(F)=vec(X) LinBP* [I - H A ]×vec(F)=vec(X) 0.70.20.1 0.20.60.2 0.10.20.7

40 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Applications of Guilt-by-Association Approaches 40

41 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Center-Piece Subgraph(CePS) 41 Original Graph CePS Q: How to find hub for the black nodes? A: Proximity! [Tong+ KDD’06] CePS guy Input Output Red: Max (Prox(A, Red) x Prox(B, Red) x Prox(C, Red))

42 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos CePS Example 42 [Tong+ KDD’06]

43 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos 43 Related Work Recommendation [APOLO, Chau+ ’11] initial interest Relevance indicated by color saturation.

44 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Fraud detection Email scams16.6% Non-delivery merchandise11.9% Fee fraud 9.8% Identity theft 8.2% Overpayment fraud 7.3% Misc Fraud 6.3% Spam 6.2% Auction fraud 5.7% … 44 of internet crime complaints in 2009 [Pandit, Chau+ ‘07]

45 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos 45 Fraud detection [Pandit, Chau+ ‘07] F raudster H onest A ccomplice F A H Near-bipartite core 66,130 users 795,320 trans. Found bipartite cores with confirmed fraudsters! Belief Propagation Fraudster Accomplice Honest

46 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Even More Applications Clustering [Ding+ KDD 2007] Email management [Minkov+ CEAS 06] Business Process Management [Qu+ 2008] ProSIN –Listen to clients’ comments [Tong+ 2008] TANGENT –Broaden Users’ Horizon [Oonuma & Tong + 2008] Ghost Edge Within Network Classification [Gallagher & Tong+ KDD08 b] … 46

47 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Roadmap Node Roles Node Proximity –Graph-theoretic Approaches –Effective Conductance –Guilt-by-association techniques –Summary 47

48 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos Summary Node Roles and Proximity are complementary Node Proximity: –Building block for many applications! –Many (wrong) ways to define similarities. –Guilt-by-association techniques and effective conductance are similar (and recommended). 48

49 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos What we will cover next 49

50 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos References Guilt-by-association techniques Wolfgang Gatterbauer, Stephan Guennemann, Danai Koutra, Christos Faloutsos. Linearized and Single-Pass Belief Propagation. Proceedings of the VLDB Endowment, Volume 8(4) (VLDB'15), August 2015. [code]Linearized and Single-Pass Belief Propagationcode Danai Koutra, Tai-You Ke, U. Kang, Duen Horng Chau, Hsing-Kuo Kenneth Pao, and Christos Faloutsos. 2011. Unifying guilt-by-associationUnifying guilt-by-association approaches: theorems and fast algorithms approaches: theorems and fast algorithms. ECML PKDD'11. [code]code Duen Horng Chau, Aniket Kittur, Jason I Hong, Christos Faloutsos. Apolo: making sense of large network data by combining rich user interaction and machine learning. In SIGCHI, 267-176, 2011. W. Cohen. (2007) Graph Walks and Graphical Models. Draft. H. Tong, S. Papadimitriou, P.S. Yu & C. Faloutsos. (2008) Proximity Tracking on Time-Evolving Bipartite Graphs. to appear in SDM 2008. B. Gallagher, H. Tong, T. Eliassi-Rad, C. Faloutsos. Using Ghost Edges for Classification in Sparsely Labeled Networks. KDD 2008. 50 (In reverse chronological order)

51 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos References H. Tong, H. Qu, and H. Jamjoom. Measuring Proximity on Graphs with Side Information. Proceedings of ICDM 2008. S Pandit, DH Chau, S Wang, C Faloutsos. Netprobe: a fast and scalable system for fraud detection in online auction networks. In WWW, 201- 210, 2007. H. Tong, Y. Koren, & C. Faloutsos. (2007) Fast direction-aware proximity for graph mining. In KDD, 747-756, 2007. S. Chakrabarti. (2007) Dynamic personalized pagerank in entity- relation graphs. In WWW, 571-580, 2007. F. Fouss, A. Pirotte, J.-M. Renders, & M. Saerens. (2007) Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation. IEEE Trans. Knowl. Data Eng. 19(3), 355-369 2007. H. Tong, Y. Koren, & C. Faloutsos. (2007) Fast direction-aware proximity for graph mining. In KDD, 747-756, 2007. 51

52 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos References H. Tong, B. Gallagher, C. Faloutsos, & T. Eliassi-Rad. (2007) Fast best- effort pattern matching in large attributed graphs. In KDD, 737-746, 2007. Einat Minkov, William W. Cohen, and Andrew Y. Ng. 2006. Contextual search and name disambiguation in email using graphs. ACM SIGIR '06. H. Tong & C. Faloutsos. (2006) Center-piece subgraphs: problem definition and fast solutions. In KDD, 404-413, 2006. H. Tong, C. Faloutsos, & J.Y. Pan. (2006) Fast Random Walk with Restart and Its Applications. In ICDM, 613-622, 2006. A. Agarwal, S. Chakrabarti & S. Aggarwal. (2006) Learning to rank networked entities. In KDD, 14-23, 2006. J. Sun, H. Qu, D. Chakrabarti & C. Faloutsos. (2005) Neighborhood Formation and Anomaly Detection in Bipartite Graphs. In ICDM, 418- 425, 2005. J.Y. Pan, H.J. Yang, C. Faloutsos & P. Duygulu. (2004) Automatic multimedia cross-modal correlation discovery. In KDD, 653-658, 2004. 52

53 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos References C. Faloutsos, K. S. McCurley, and A. Tomkins. Fast discovery of connection subgraphs. In Proc. 10th ACM SIGKDD conference, pages 118– 127, 2004. David Liben-Nowell and Jon Kleinberg. 2003. The link prediction problem for social networks. CIKM '03. Jonathan S. Yedidia, William T. Freeman, and Yair Weiss. 2003. Understanding belief propagation and its generalizations. In Exploring artificial intelligence in the new millennium, Gerhard Lakemeyer and Bernhard Nebel (Eds.). T.H. Haveliwala (2002) Topic-Sensitive Pagerank. In WWW, 517-526, 2002. Sergey Brin and Lawrence Page. 1998. The anatomy of a large-scale hypertextual Web search engine. In World Wide Web 7 (WWW7). Peter G. Doyle and J. Laurie Snell. Random walks and electric networks, The Mathematical Association of America, 1984. 53

54 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos References Graph Theoretic Techniques Zhang, Yinglong and Li, Cuiping and Chen, Hong and Sheng, Likun. Fast SimRank Computation over Disk-Resident Graphs. Lecture Notes in Computer Science, Database Systems for Advanced Applications. 2013. Cuiping Li, Jiawei Han, Guoming He, Xin Jin, Yizhou Sun, Yintao Yu, and Tianyi Wu. 2010. Fast computation of SimRank for static and dynamic information networks. In Proceedings of the 13th International Conference on Extending Database Technology (EDBT '10). ACM, 465-476. Pei Li and Hongyan Liu and Jeffrey Xu and Yu Jun and He Xiaoyong Du. 2010. Fast single-pair simrank computation. In Proc. of the SIAM Intl. Conf. on Data Mining (SDM 2010). Ryan N. Lichtenwalter, Jake T. Lussier, and Nitesh V. Chawla. 2010. New perspectives and methods in link prediction. KDD '10. Dmitry Lizorkin, Pavel Velikhov, Maxim Grinev, and Denis Turdakov. 2010. Accuracy estimate and optimization techniques for SimRank computation. Weiren Yu; Xuemin Lin; Jiajin Le, "A Space and Time Efficient Algorithm for SimRank Computation”, 2010. 12th International Asia-Pacific Web Conference (APWEB), pp.164-170. 54

55 ICDM’14 Tutorial D. Koutra & T. Eliassi-Rad & C. Faloutsos References Ioannis Antonellis, Hector Garcia Molina, and Chi Chao Chang. 2008. Simrank++: query rewriting through link analysis of the click graph. Proc. VLDB Endow. 1, 1 (August 2008), 408-421. Yehuda Koren, Stephen C. North, and Chris Volinsky. 2007. Measuring a nd extracting proximity graphs in networks. ACM TKDD 1, 3, Article 12 (Dec 2007) Dániel Fogaras and Balázs Rácz. 2005. Scaling link-based similarity search. WWW '05. Vincent D. Blondel, Anahí Gajardo, Maureen Heymans, Pierre Senellart, and Paul Van Dooren. 2004. A Measure of Similarity between Graph Vertices: Applications to Synonym Extraction and Web Searching. SIAM Rev. 46, 4 (April 2004), 647-666. Glen Jeh and Jennifer Widom. 2002. SimRank: a measure of structural-context similarity. ACM SIGKDD 2002. 55


Download ppt "Dept. of Computer Science Rutgers Node and Graph Similarity : Theory and Applications Danai Koutra (CMU) Tina Eliassi-Rad (Rutgers) Christos Faloutsos."

Similar presentations


Ads by Google