# School of Computer Science Carnegie Mellon University Duke University DeltaCon: A Principled Massive- Graph Similarity Function Danai Koutra Joshua T.

## Presentation on theme: "School of Computer Science Carnegie Mellon University Duke University DeltaCon: A Principled Massive- Graph Similarity Function Danai Koutra Joshua T."— Presentation transcript:

School of Computer Science Carnegie Mellon University Duke University DeltaCon: A Principled Massive- Graph Similarity Function Danai Koutra Joshua T. Vogelstein Christos Faloutsos SDM, 2-5 May 2013, Texas-Austin, USA

CMU Duke Problem Definition: Graph Similarity Given: (i) 2 graphs with the same nodes and different edge sets (ii) node correspondence Find: similarity score s [0,1] © Danai Koutra (CMU) - SDM'13 2 GAGA GBGB

CMU Duke Problem Definition: Graph Similarity Given: (a) 2 graphs with the same nodes and different edge sets (b) node correspondence Find: similarity score, s [0,1] © Danai Koutra (CMU) - SDM'13 3 GAGA GBGB

CMU Duke Motivation (1) © Danai Koutra (CMU) - SDM'13 4 Discontinuity Detection Day 1 Day 2 Day 3 Day 4 Day 5 2 2 Classification 1 1 different brain wiring?

CMU Duke Motivation (2) © Danai Koutra (CMU) - SDM'13 5 Intrusion detection 4 4 Behavioral Patterns 3 3 FB message graph vs. wall-to-wall network

CMU Duke Problem: Graph Similarity Is there any obvious solution? © Danai Koutra (CMU) - SDM'13 6

CMU Duke One Solution Edge Overlap (EO) # of common edges (normalized or not) © Danai Koutra (CMU) - SDM'13 7 GAGA GBGB

CMU Duke … but “barbell”… EO(B10,mB10) == EO(B10,mmB10) © Danai Koutra (CMU) - SDM'13 8 GAGA GAGA GBGB G B’

CMU Duke Contributions Theory  Axioms  Desired Properties Practice  D ELTA C ON algorithm  Real-world applications  Experiments on synthetic & real graphs © Danai Koutra (CMU) - SDM'13 9 Delta Connectivity

CMU Duke Roadmap Intuition Axioms & Properties Proposed Algorithm: D ELTA C ON Applications Experiments Related Work Conclusions © Danai Koutra (CMU) - SDM'13 10

CMU Duke Intuition (1) STEP 1: Compute the pairwise node influence, S A & S B © Danai Koutra (CMU) - SDM'13 11 GAGA GBGB SA =SA = S B =

CMU Duke Intuition (2) STEP 2: Find the similarity between S A & S B. © Danai Koutra (CMU) - SDM'13 12 SA =SA = S B =

CMU Duke Intuition (2) STEP 2: Find the similarity between S A & S B. sim( S A, S B ) = 0.3 © Danai Koutra (CMU) - SDM'13 13 S B = SA =SA =

CMU Duke Roadmap Intuition Axioms & Properties Proposed Algorithm: D ELTA C ON Applications Experiments Related Work Conclusions © Danai Koutra (CMU) - SDM'13 14

CMU Duke … many similarity functions can be defined… But … © Danai Koutra (CMU) - SDM'13 15 … what properties should a good similarity function have?

CMU Duke Axioms © Danai Koutra (CMU) - SDM'13 16 A1. Identity property sim(, ) = 1 A2. Symmetric property sim(, ) = sim(, ) A3. Zero property sim(, ) = 0

CMU Duke Roadmap Intuition Axioms & Properties Proposed Algorithm: D ELTA C ON Applications Experiments Related Work Conclusions © Danai Koutra (CMU) - SDM'13 17

CMU Duke Desired Properties (1) © Danai Koutra (CMU) - SDM'13 18 Intuitiveness P1. Edge Importance P2. Weight Awareness P3. Edge-“Submodularity” P4. Focus Awareness Scalability

CMU Duke Desired Properties (2) © Danai Koutra (CMU) - SDM'13 19 Intuitiveness P1. Edge Importance P2. Weight Awareness P3. Edge-“Submodularity” P4. Focus Awareness Scalability Creation of disconnected components matters more than small connectivity changes.

CMU Duke Desired Properties (3) © Danai Koutra (CMU) - SDM'13 20 Intuitiveness P1. Edge Importance P2. Weight Awareness P3. Edge-“Submodularity” P4. Focus Awareness Scalability The bigger the edge weight, the more the edge change matters. w=5 w=1 ✗ ✗

CMU Duke Desired Properties (4) © Danai Koutra (CMU) - SDM'13 21 Intuitiveness P1. Edge Importance P2. Weight Awareness P3. Edge-“Submodularity” P4. Focus Awareness Scalability “Diminishing Returns”: The sparser the graphs, the more important is a ‘’fixed’’ change. n=5 GAGA GAGA GBGB GBGB

CMU Duke Desired Properties (1) © Danai Koutra (CMU) - SDM'13 22 Intuitiveness P1. Edge Importance P2. Weight Awareness P3. Edge-“Submodularity” P4. Focus Awareness Scalability Targeted changes are more important than random changes of the same extent. GAGA targeted G B’ random G B

CMU Duke How do state-of-the-art methods fare? © Danai Koutra (CMU) - SDM'13 23 MetricP1P2P3P4 Vertex/Edge Overlap ✗✗✗ ? Graph Edit Distance (XOR) ✗✗✗ ? Signature Similarity ✗✔✗ ? λ-distance (adjacency matrix) ✗✔✗ ? λ-distance (graph laplacian) ✗✔✗ ? λ-distance (normalized lapl.) ✗✔✗ ? D ELTA C ON 0 ✔✔✔✔ D ELTA C ON ✔✔✔✔ edge weight returns focus

CMU Duke Roadmap Intuition Axioms & Properties Proposed Algorithm: D ELTA C ON Experiments Applications Related Work Conclusions © Danai Koutra (CMU) - SDM'13 24

CMU Duke Proposed algorithm: D ELTA C ON 0 ①Find the pairwise node influence, S A & S B. © Danai Koutra (CMU) - SDM'13 25 SA =SA = S B = BASE ALGO

CMU Duke STEP 1: How to compute node influence? A1: Pagerank A2: Personalized Random Walk with Restart (RWR) A3: Lazy RWR A4: “Electrical network analogy” - resistances A5: Belief Propagation F A BP … © Danai Koutra (CMU) - SDM'13 26

CMU Duke STEP 1: Intuition of BP © Danai Koutra (CMU) - SDM'13 27 BACKGROUND iterative message-based method Iteration 1 Iteration 2 0 0 0 e.g., CS person

CMU Duke STEP 1: Fast BP (1) 1 d1 d2 d3 d1 d2 d3 1 0 1 1 1 0 0 1 1 1 0 1 1 1 0 0 1 1 ? ? 0 1 0 1 0 © Danai Koutra (CMU) - SDM'13 28 BACKGROUND i th row similar to RWR

CMU Duke STEP 1: Fast BP (1) 1 d1 d2 d3 d1 d2 d3 1 0 1 1 1 0 0 1 1 1 0 1 1 1 0 0 1 1 ? ? 0 1 0 1 0 © Danai Koutra (CMU) - SDM'13 29 BACKGROUND i th row similar to RWR strength of influence between neighbors

CMU Duke STEP 1: Fast BP (1) 1 d1 d2 d3 d1 d2 d3 1 0 1 1 1 0 0 1 1 1 0 1 1 1 0 0 1 1 ? ? 0 1 0 1 0 © Danai Koutra (CMU) - SDM'13 30 BACKGROUND i th row similar to RWR final influence from node i strength of influence between neighbors

CMU Duke STEP 1: Fast BP (2) 1 d1 d2 d3 d1 d2 d3 1 0 1 1 1 0 0 1 1 1 0 1 1 1 0 0 1 1 ? ? 0 1 0 1 0 © Danai Koutra (CMU) - SDM'13 31 i th row 1 0.2 0.1 0.3 1 0.2 0 0.5 1 1 0.2 0.1 0.3 1 0.2 0 0.5 1 OR pairwise influence matrix:

CMU Duke STEP 1: Why FaBP? 1)Sound theoretical background (MLE on marginals) 2)Fast: linear on the edges 3)Attenuating Neighboring Influence © Danai Koutra (CMU) - SDM'13 32 DETAILS

CMU Duke STEP 1: Why FaBP? 1)Sound theoretical background (MLE on marginals) 2)Fast: linear on the edges 3)Attenuating Neighboring Influence for small ε: © Danai Koutra (CMU) - SDM'13 33 1-hop 2-hops … ε > ε 2 >... 0<ε<1 INTUITION

CMU Duke Proposed algorithm: D ELTA C ON 0 ①Find the pairwise influence (F A BP), S A & S B. ②Find distance. © Danai Koutra (CMU) - SDM'13 34 SA,SBSA,SB = Matusita distance SA =SA = S B = BASE ALGO BASE ALGO

CMU Duke Proposed algorithm: D ELTA C ON 0 ①Apply F A BP to find the pairwise influence matrices, S A & S B. ②Find distance. ①Find similarity, © Danai Koutra (CMU) - SDM'13 35 SA,SBSA,SB = Matusita distance SA =SA = S B = BASE ALGO BASE ALGO

CMU Duke … but O(n 2 ) … © Danai Koutra (CMU) - SDM'13 36 f a s t e r ?

CMU Duke Proposed Algorithm: D ELTA C ON – STEP 1 (1) © Danai Koutra (CMU) - SDM'13 37 1a Create g disjoint & covering node groups. 1 4 2 3 A = 4 3 2 1 Adjacency matrix FASTE R ALGO FASTE R ALGO

CMU Duke Proposed Algorithm: D ELTA C ON – STEP 1 (2) © Danai Koutra (CMU) - SDM'13 38 1a Create g disjoint & covering node groups. 1b For group i, find node-group influence (F A BP) 1 4 2 3 FASTE R ALGO FASTE R ALGO

CMU Duke Proposed Algorithm: D ELTA C ON – STEP 1 (3) © Danai Koutra (CMU) - SDM'13 39 1b e.g., for group 1, find node-group influence (F A BP): S’ A = 12341234 g r o u p s INTUITION SA =SA = 1 2 3 4 row-wise

CMU Duke Proposed Algorithm: D ELTA C ON – STEP 1 (4) © Danai Koutra (CMU) - SDM'13 40 1a Create g disjoint & covering node groups. 1b For group i, find node-group influence (F A BP) 1c Create node-group influence matrices, S’ A & S’ B. 1 4 2 3 S’ B = S’ A = 12341234 12341234 g r o u p s FASTE R ALGO FASTE R ALGO

CMU Duke Proposed Algorithm: D ELTA C ON (5) © Danai Koutra (CMU) - SDM'13 41 1a Create g disjoint & covering node groups. 1b For group i, find node-group influence (F A BP) 1c Create node-group influence matrices, S’ A & S’ B. 1 4 2 3 FASTE R ALGO FASTE R ALGO S’ B = S’ A = 12341234 12341234 g r o u p s

CMU Duke Roadmap Intuition Axioms & Properties Proposed Algorithm: D ELTA C ON Applications ENRON: anomaly detection Brain Graphs: clustering Experiments Conclusions © Danai Koutra (CMU) - SDM'13 42

CMU Duke Temporal Anomaly Detection in ENRON (1) © Danai Koutra (CMU) - SDM'13 43 Nodes: employees Edges: email exchange D ELTA C ON similarities of consecutive timestamps Day 1 Day 2 Day 3 Day 4 Day 5 sim 1 sim 2 sim 3 sim 4

CMU Duke Temporal Anomaly Detection in ENRON (2) © Danai Koutra (CMU) - SDM'13 44 similarity consecutive days IMR

CMU Duke Roadmap Intuition Axioms & Properties Proposed Algorithm: D ELTA C ON Applications ENRON: anomaly detection Brain Graphs: clustering Experiments Related Work Conclusions © Danai Koutra (CMU) - SDM'13 45

CMU Duke Brain Connectivity Graph Clustering (1) © Danai Koutra (CMU) - SDM'13 46 114 aligned connectomes (FMRI) Nodes: 70 cortical regions Edges: connections Attributes: gender, IQ, age…

CMU Duke Brain Connectivity Graph Clustering (2) © Danai Koutra (CMU) - SDM'13 47 ①pairwise D ELTA C ON similarities ②hierarchical clustering ③t-test / ANOVA for given attributes Ward’s linkage

CMU Duke Brain Connectivity Graph Clustering (3) © Danai Koutra (CMU) - SDM'13 48 High CCI Low CCI t-test / ANOVA for given attributes p-value = 0.0057

CMU Duke Roadmap Intuition Axioms & Properties Proposed Algorithm: D ELTA C ON Applications Experiments Scalability Conclusions © Danai Koutra (CMU) - SDM'13 49

CMU Duke Scalability Dataset: Kronecker graphs D ELTA C ON is linear on the edges + groups; O(g×n + g×(m 1 +m 2 ). Dataset: Kronecker graphs D ELTA C ON is linear on the edges + groups; O(g×n + g×(m 1 +m 2 ). # of edges = max{m 1,m 2 } runtime (min) © Danai Koutra (CMU) - SDM'13 50 SLOPE = 1 # of edges in G A & G B # of nodes

CMU Duke Roadmap Intuition Axioms & Properties Proposed Algorithm: D ELTA C ON Applications Experiments Related Work Conclusions © Danai Koutra (CMU) - SDM'13 51

CMU Duke State-of-the-art Approaches Vertex/Edge Overlap [Papadimitriou, Dasdan, Garcia-Molina. JISA’10] Graph Edit Distance [Papadimitriou, Dasdan, Garcia-Molina. JISA’10] Signature Similarity (SimHash algorithm) [Papadimitriou, Dasdan, Garcia-Molina. JISA’10] λ-distance [Peabody ’03; Bunke, Dickinson, Kraetzl, Wallis ‘06] … © Danai Koutra (CMU) - SDM'13 52

CMU Duke Roadmap Intuition Axioms & Properties Proposed Algorithm: D ELTA C ON Applications Experiments Related Work Conclusions © Danai Koutra (CMU) - SDM'13 53

CMU Duke Conclusions Theory  Axioms  Desired Properties Practice  D ELTA C ON algorithm principled intuitive and scalable  Real-world applications  Experiments on synthetic & real graphs © Danai Koutra (CMU) - SDM'13 54 axioms properties linear on input Temporal anomaly detection + brain scans classification

CMU Duke Thank you! © Danai Koutra (CMU) - SDM'13 55

CMU Duke Backup slide (1): What if unknown correspondence? Graph matching + then DeltaCon …work in progress… Global Feature Extraction + comparison e.g., λ-distance [Peabody ‘03], [Macindoe & Richards ‘10] Local Feature Extraction + aggregation + comparison [Berlingerio et al. ’12] … © Danai Koutra (CMU) - SDM'13 56

CMU Duke Backup slide (2): Bounds Lemma: Lower bound. sim DC0 (G1; G2) ≤sim DC (G1; G2). Conjecture: Upper bound. Johnson-Lindenstrauss lemma © Danai Koutra (CMU) - SDM'13 57

CMU Duke Backup slide (3): # of groups - sensitivity © Danai Koutra (CMU) - SDM'13 58

CMU Duke Backup slide (5): Datasets Dataset# nodes# edges Synthetic graphs5-104-90 Kronecker graphs6K -1.6M66K – 67.1M Brain Graphs70800-1208 Enron36,692367,662 Epinions131,828841,372 Email EU265,214420,045 Web Google875,7145,105,039 AS Skitter1,696,41511,095,298 © Danai Koutra (CMU) - SDM'13 59

Download ppt "School of Computer Science Carnegie Mellon University Duke University DeltaCon: A Principled Massive- Graph Similarity Function Danai Koutra Joshua T."

Similar presentations