# Dept. of Computer Science Rutgers Node and Graph Similarity: Theory and Applications Danai Koutra (CMU) Tina Eliassi-Rad (Rutgers) Christos Faloutsos (CMU)

## Presentation on theme: "Dept. of Computer Science Rutgers Node and Graph Similarity: Theory and Applications Danai Koutra (CMU) Tina Eliassi-Rad (Rutgers) Christos Faloutsos (CMU)"— Presentation transcript:

Dept. of Computer Science Rutgers Node and Graph Similarity: Theory and Applications Danai Koutra (CMU) Tina Eliassi-Rad (Rutgers) Christos Faloutsos (CMU) ICDM 2014, Monday December 15 th 2014, Shenzhen, China Copyright for the tutorial materials is held by the authors. The authors grant IEEE ICDM permission to distribute the materials through its website.

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial Part 2a Graph Similarity: known node correspondence 2

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial What to remember Numerous applications: –Network monitoring, anomaly detection, network intrusion, behavioral studies Although seems easy problem, it’s not! –Some measures are counter-intuitive. –DeltaCon [Koutra+, SDM’13] (based on node proximity) satisfies several intuitive properties. There are multiple measures, but which one to use? –Depends on the application! –Good news according to the guide of [Soundarajan+, SDM’14] ! 3

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial Roadmap Known node correspondence –Simple features –Complex features –Visualization –Summary Unknown node correspondence 4

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial Problem Definition: Graph Similarity Given: (i) 2 graphs with the same nodes and different edge sets (ii) node correspondence Find: similarity score s [0,1] GAGA GBGB 5

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial Problem Definition: Graph Similarity Given: (a) 2 graphs with the same nodes and different edge sets (b) node correspondence Find: similarity score, s [0,1] GAGA GBGB 6

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial Applications Discontinuity Detection Day 1 Day 2 Day 3 Day 4 Day 5 2 2 Classification 1 1 different brain wiring? 7

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial Applications Intrusion detection 4 4 Behavioral Patterns 3 3 FB message graph vs. wall-to-wall network 8

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial Roadmap Known node correspondence –Simple features –Complex features –Visualization –Summary Unknown node correspondence 9

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial Is there any obvious solution? 10

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial One Solution Edge Overlap(EO) # of common edges (normalized or not) GAGA GBGB 11

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial … but “barbell”… EO(B10,mB10) == EO(B10,mmB10) GAGA GAGA GBGB G B’ 12

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial Other solutions? 13

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial 1. “… they share many vertices and/or edges” 2. “… the rankings of their vertices are similar.” VR = rank correlation of node pagerank 3. “… their edge weights are similar.” GAGA GBGB Vertex/Edge Overlap O(|V|+|V’|+|E|+|E’|) Vertex Ranking O(|V|+|V’|) Similar if … Weighted distance O(|E|+|E’|) 14 [Papadimitriou, Dasdan, Garcia-Molina ’10; Bunke ‘06, Shoubridge+ ’02, Dickinson+ ’04] 14

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial 4. “… they have similar subgraphs.” 5. “… if we need few node/edge additions/deletions to transform G A to G B ” GAGA GBGB Similar if … Maximum Common Subgraph NP-complete ( weighted ) Graph Edit Distance Vertex MCS Distance Edge MCS Distance [ Bunke ‘06, Shoubridge+ ’02, Dickinson+ ’04; [Bunke+ ’98, ’06, Riesen ’09, Gao ’10, Fankhauser ’11; Kapsabelis+ ’07]

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial 6. “… they have similar fingerprints.” b-bit fingerprint of G A : b-bit fingerprint of G B : Hamming Distance: 1 GAGA GBGB Similar if … Signature similarity 10101 00101 [Papadimitriou, Dasdan, Garcia-Molina ‘10] 16

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial Event Detection [Bunke+ ’06] MCS Distance (|G|=|V|) day 17

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial Application: Web graph anomaly detection [Papadimitriou, Dasdan, Garcia-Molina ‘10] 18

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial Roadmap Known node correspondence –Simple features –Complex features –Visualization –Summary Unknown node correspondence 19

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial Graph Kernels: Idea 1)Compute graph substructures in poly time 2)Compare them to find sim(G A, G B ) Source: http://mloss.org/software/view/139/ G A G B sim(G A, G B ) GAGA GBGB [Vishwanathan] 20

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial Fast Subtree Kernel [Shervashidze+ ’09 NIPS, JMLR’11] O(m h) per graph pair Sorted list of neighbors Labeled graphs Label compression ( hash func. on sorted strings ) Relabeling Weisfeiler-Lehman algorithm test for isomorphism 21

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial Graph kernels: Applications [Ralaivola+ ’05, Borgwardt+ ’05] Source: http://www.ra.cs.uni-tuebingen.de/forschung/molsim/welcome_e.html Aligning chemical compounds Function prediction 22

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial Other Graph Kernels RWR [Kashima+ ’03, Gaertner+ ’03, Vishwanathan ’10] Shortest path kernels [ Borgwardt & Kriegel ’05] Cyclic path kernels [Horvath+ ’04] Depth-first search kernels [Swamidass+ ’05] Subtree kernels [Shervashidze+ ’09 NIPS, JMLR’11, Ralaivola+ ’05] Graphlet / Subgraph kernels [Shervashidze+ ’09, Thoma+ ’10] All-paths kernels [Airola+ ’08] … 23

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial … Many similarity functions can be defined… W hat properties should a good similarity function have? 24

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial Axioms A1. Identity property sim(, ) = 1 A2. Symmetric property sim(, ) = sim(, ) A3. Zero property sim(, ) = 0 [Koutra, Faloutsos, Vogelstein ‘13] 25

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial Desired Properties Intuitiveness P1. Edge Importance P2. Weight Awareness P3. Edge-“Submodularity” P4. Focus Awareness Scalability [Koutra, Faloutsos, Vogelstein ‘13] 26

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial Desired Properties Intuitiveness P1. Edge Importance P2. Weight Awareness P3. Edge-“Submodularity” P4. Focus Awareness Scalability Creation of disconnected components matters more than small connectivity changes. [Koutra, Faloutsos, Vogelstein ‘13] 27

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial Desired Properties Intuitiveness P1. Edge Importance P2. Weight Awareness P3. Edge-“Submodularity” P4. Focus Awareness Scalability The bigger the edge weight, the more the edge change matters. w=5 w=1 ✗ ✗ [Koutra, Faloutsos, Vogelstein ‘13] 28

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial Desired Properties Intuitiveness P1. Edge Importance P2. Weight Awareness P3. Edge-“Submodularity” P4. Focus Awareness Scalability “Diminishing Returns”: The sparser the graphs, the more important is a ‘’fixed’’ change. n=5 GAGA GAGA GBGB GBGB [Koutra, Faloutsos, Vogelstein ‘13] 29

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial Desired Properties Intuitiveness P1. Edge Importance P2. Weight Awareness P3. Edge-“Submodularity” P4. Focus Awareness Scalability Targeted changes are more important than random changes of the same extent. GAGA targeted G B’ random G B [Koutra, Faloutsos, Vogelstein ‘13] 30

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial How do state-of-the-art methods fare? MetricP1P2P3P4 Vertex/Edge Overlap ✗✗✗ ? Graph Edit Distance (XOR) ✗✗✗ ? Signature Similarity ✗✔✗ ? λ-distance (adjacency matrix) ✗✔✗ ? λ-distance (graph laplacian) ✗✔✗ ? λ-distance (normalized lapl.) ✗✔✗ ? importance weight returns focus [Koutra, Vogelstein, Faloutsos ‘13] 31

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial Is there a method that satisfies the properties? Yes! DeltaCon 32

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial D ELTA C ON SA =SA = S B = D ETAILS ① Find the pairwise node influence, S A & S B. ② Find the similarity between S A & S B. [Koutra, Faloutsos, Vogelstein ‘13] 33

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial How? Using FaBP. Sound theoretical background ( MLE on marginals ) Attenuating Neighboring Influence for small ε: 1-hop 2-hops … Note: ε > ε 2 >..., 0<ε<1 I NTUITION 34

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial O UR S OLUTION : D ELTA C ON D ETAILS ① Find the pairwise node influence, S A & S B. ② Find the similarity between S A & S B. SA,SBSA,SB S B = SA =SA = sim( S A, S B ) = 0.3 [Koutra, Faloutsos, Vogelstein ‘13] 35

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial … but O(n 2 ) … f a s t e r ? 1 4 2 3 in the paper http://www.cs.cmu.edu/~dkoutra/CODE/deltacon.zip [Koutra, Faloutsos, Vogelstein ‘13] 36

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial Comparison of methods revisited MetricP1P2P3P4 Vertex/Edge Overlap ✗✗✗ ? Graph Edit Distance (XOR) ✗✗✗ ? Signature Similarity ✗✔✗ ? D ELTA C ON 0 ✔✔✔✔ D ELTA C ON ✔✔✔✔ edge weight returns focus [Koutra, Faloutsos, Vogelstein ‘13] 37

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial Nodes: employees Edges: email exchange Day 1 Day 2 Day 3 Day 4 Day 5 sim 1 sim 2 sim 3 sim 4 Temporal Anomaly Detection [Koutra, Faloutsos, Vogelstein ‘13] 38

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial similarity consecutive days Feb 4: Lay resigns Temporal Anomaly Detection [Koutra, Faloutsos, Vogelstein ‘13] 39

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial Brain Connectivity Graph Clustering 114 brain graphs –Nodes: 70 cortical regions –Edges: connections Attributes: gender, IQ, age… [Koutra, Faloutsos, Vogelstein ‘13] 40

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial Brain Connectivity Graph Clustering t-test p-value = 0.0057 [Koutra, Faloutsos, Vogelstein ‘13] 41

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial Roadmap Known node correspondence –Simple features –Complex features –Visualization –Summary Unknown node correspondence 42

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial Tested Visual Encodings [Alper+ ’13, CHI] Augmenting the graphs /adjacency matrices to show the differences. User Study Result: For bigger and sparser graphs, matrices are better. 40-80 nodes low density 43

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial More on visualization For large graphs HoneyComb [van Ham+ ’09] Reference graph [Andrews ’09] Interactive comparison [Hascoet+ ’12] General principles [Gleicher+ ’11] … 44

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial Roadmap Known node correspondence –Simple features –Complex features –Visualization –Summary Unknown node correspondence 45

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial A Guide to Selecting a Measure [Soundarajan, Gallagher, Eliassi-Rad. SDM’14] H 15 H 1 H 20 H 2 H k … 46

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial Q1Q2 Q3 Much higher than expected! Some complex methods are very similar to simpler methods NetSimile, RWR often close to consensus [Soundarajan, Gallagher, Eliassi-Rad. SDM’14] A Guide to Selecting a Measure Are the graph similarity methods correlated? Are there groups of methods that behave comparably? How can we get a single consensus method? RWR ≈BP≈SSL [Koutra+ PKDD’11] 47

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial Summary Numerous applications: –Network monitoring, anomaly detection, network intrusion, behavioral studies Although seems easy problem, it’s not! –Some measures are counter-intuitive. –DeltaCon [Koutra+, SDM’13] (based on node proximity) satisfies several intuitive properties. There are multiple measures, but which one to use? –Depends on the application! –Good news according to the guide of [Soundarajan+, SDM’14] ! 48

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial References S. Soundarajan and B. Gallagher, T. Eliassi-Rad. 2014. A Guide to Selecting a Network Similarity Method. SDM 2014. D. Koutra, J.T. Vogelstein, C. Faloutsos. 2013. DELTACON: A Principled Massive-Graph Similarity Function. SDM 2013: 162- 170. [CODE]DELTACON: A Principled Massive-Graph Similarity FunctionCODE Stefan Fankhauser, Kaspar Riesen, and Horst Bunke. 2011. Speeding up graph edit distance computation through fast bipartite matching. In GbRPR'11. Xinbo Gao, Bing Xiao, Dacheng Tao, and Xuelong Li. 2010. A survey of graph edit distance. Pattern Anal. Appl. 13, 1 (January 2010), 113-129. Papadimitriou, Panagiotis and Dasdan, Ali and Garcia-Molina, Hector (2010). Web Graph Similarity for Anomaly Detection. Journal of Internet Services and Applications, Volume 1 (1). pp. 19-30. 49 (In reverse chronological order)

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial References Kaspar Riesen and Horst Bunke. 2009. Approximate graph edit distance computation by means of bipartite graph matching. Kelly Marie Kapsabelis, Peter John Dickinson, Kutluyil Dogancay. Investigation of graph edit distance cost functions for detection of network anomalies. ANZIAM J. 48 (CTAC2006) pp.436–449, 2007. H. Bunke, P. J. Dickinson, M. Kraetzl, and W. D. Wallis. A Graph- Theoretic Approach to Enterprise Network Dynamics (PCS). Birkhauser, 2006. Shoubridge P., Kraetzl M., Wallis W. D., Bunke H. Detection of Abnormal Change in a Time Series of Graphs. Journal of Interconnection Networks (JOIN) 3(1-2):85-101, 2002. Horst Bunke and Kim Shearer. 1998. A graph distance metric based on the maximal common subgraph. Pattern Recogn. Lett. 19, 3-4 (March 1998), 255-259. 50

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial References Kelmans, A. 1976. Comparison of graphs by their number of spanning trees. Discrete Mathematics 16, 3, 241 – 261. Kernels (for more references, check slide 22) U. Kang, H. Tong, and J. Sun. Fast random walk graph kernel. in SDM, 2012. Nino Shervashidze, Pascal Schweitzer, Erik Jan van Leeuwen, Kurt Mehlhorn, and Karsten M. Borgwardt. 2011. Weisfeiler- Lehman Graph Kernels. J. Mach. Learn. Res. 12, 2539-2561. N. Shervashidze and K. M. Borgwardt. Fast subtree kernels on graphs. In Advances in Neural Information Processing Systems, pages 1660–1668, 2009. Airola, A., Pyysalo, S., Björne, J., Pahikkala, T., Ginter, F., & Salakoski, T. (2008). All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning. BMC Bioinformatics C7 - S2, 9(Suppl 11). 51

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial References Visualization Basak Alper, Benjamin Bach, Nathalie Henry Riche, Tobias Isenberg, and Jean-Daniel Fekete. 2013. Weighted graph comparison techniques for brain connectivity analysis. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '13). Mountaz Hascoët and Pierre Dragicevic. 2012. Interactive graph matching and visual comparison of graphs and clustered graphs. In Proceedings of the International Working Conference on Advanced Visual Interfaces (AVI '12). Michael Gleicher, Danielle Albers, Rick Walker, Ilir Jusufi, Charles D. Hansen, and Jonathan C. Roberts. 2011. Visual comparison for information visualization. Andrews, K., Wohlfahrt, M., and Wurzinger, G. 2009. Visual graph comparison. In Information Visualisation, 2009 13th International Conference. 62 –67. 52

D. Koutra & T. Eliassi-Rad & C. Faloutsos ICDM’14 Tutorial References Frank Ham, Hans-Jörg Schulz, and Joan M. Dimicco. 2009. Honeycomb: Visual Analysis of Large Scale Social Networks. In Proceedings of the 12th IFIP TC 13 International Conference on Human-Computer Interaction: Part II (INTERACT '09) 53

Download ppt "Dept. of Computer Science Rutgers Node and Graph Similarity: Theory and Applications Danai Koutra (CMU) Tina Eliassi-Rad (Rutgers) Christos Faloutsos (CMU)"

Similar presentations