Download presentation

Presentation is loading. Please wait.

Published byMalcolm Peake Modified over 3 years ago

1
School of Computer Science Carnegie Mellon University Duke University DeltaCon: A Principled Massive- Graph Similarity Function Danai Koutra Joshua T. Vogelstein Christos Faloutsos SDM, 2-5 May 2013, Texas-Austin, USA

2
CMU Duke Problem Definition: Graph Similarity Given: (i) 2 graphs with the same nodes and different edge sets (ii) node correspondence Find: similarity score s [0,1] © Danai Koutra (CMU) - SDM'13 2 GAGA GBGB

3
CMU Duke Problem Definition: Graph Similarity Given: (a) 2 graphs with the same nodes and different edge sets (b) node correspondence Find: similarity score, s [0,1] © Danai Koutra (CMU) - SDM'13 3 GAGA GBGB

4
CMU Duke Motivation (1) © Danai Koutra (CMU) - SDM'13 4 Discontinuity Detection Day 1 Day 2 Day 3 Day 4 Day 5 2 2 Classification 1 1 different brain wiring?

5
CMU Duke Motivation (2) © Danai Koutra (CMU) - SDM'13 5 Intrusion detection 4 4 Behavioral Patterns 3 3 FB message graph vs. wall-to-wall network

6
CMU Duke Problem: Graph Similarity Is there any obvious solution? © Danai Koutra (CMU) - SDM'13 6

7
CMU Duke One Solution Edge Overlap (EO) # of common edges (normalized or not) © Danai Koutra (CMU) - SDM'13 7 GAGA GBGB

8
CMU Duke … but “barbell”… EO(B10,mB10) == EO(B10,mmB10) © Danai Koutra (CMU) - SDM'13 8 GAGA GAGA GBGB G B’

9
CMU Duke Contributions Theory Axioms Desired Properties Practice D ELTA C ON algorithm Real-world applications Experiments on synthetic & real graphs © Danai Koutra (CMU) - SDM'13 9 Delta Connectivity

10
CMU Duke Roadmap Intuition Axioms & Properties Proposed Algorithm: D ELTA C ON Applications Experiments Related Work Conclusions © Danai Koutra (CMU) - SDM'13 10

11
CMU Duke Intuition (1) STEP 1: Compute the pairwise node influence, S A & S B © Danai Koutra (CMU) - SDM'13 11 GAGA GBGB SA =SA = S B =

12
CMU Duke Intuition (2) STEP 2: Find the similarity between S A & S B. © Danai Koutra (CMU) - SDM'13 12 SA =SA = S B =

13
CMU Duke Intuition (2) STEP 2: Find the similarity between S A & S B. sim( S A, S B ) = 0.3 © Danai Koutra (CMU) - SDM'13 13 S B = SA =SA =

14
CMU Duke Roadmap Intuition Axioms & Properties Proposed Algorithm: D ELTA C ON Applications Experiments Related Work Conclusions © Danai Koutra (CMU) - SDM'13 14

15
CMU Duke … many similarity functions can be defined… But … © Danai Koutra (CMU) - SDM'13 15 … what properties should a good similarity function have?

16
CMU Duke Axioms © Danai Koutra (CMU) - SDM'13 16 A1. Identity property sim(, ) = 1 A2. Symmetric property sim(, ) = sim(, ) A3. Zero property sim(, ) = 0

17
CMU Duke Roadmap Intuition Axioms & Properties Proposed Algorithm: D ELTA C ON Applications Experiments Related Work Conclusions © Danai Koutra (CMU) - SDM'13 17

18
CMU Duke Desired Properties (1) © Danai Koutra (CMU) - SDM'13 18 Intuitiveness P1. Edge Importance P2. Weight Awareness P3. Edge-“Submodularity” P4. Focus Awareness Scalability

19
CMU Duke Desired Properties (2) © Danai Koutra (CMU) - SDM'13 19 Intuitiveness P1. Edge Importance P2. Weight Awareness P3. Edge-“Submodularity” P4. Focus Awareness Scalability Creation of disconnected components matters more than small connectivity changes.

20
CMU Duke Desired Properties (3) © Danai Koutra (CMU) - SDM'13 20 Intuitiveness P1. Edge Importance P2. Weight Awareness P3. Edge-“Submodularity” P4. Focus Awareness Scalability The bigger the edge weight, the more the edge change matters. w=5 w=1 ✗ ✗

21
CMU Duke Desired Properties (4) © Danai Koutra (CMU) - SDM'13 21 Intuitiveness P1. Edge Importance P2. Weight Awareness P3. Edge-“Submodularity” P4. Focus Awareness Scalability “Diminishing Returns”: The sparser the graphs, the more important is a ‘’fixed’’ change. n=5 GAGA GAGA GBGB GBGB

22
CMU Duke Desired Properties (1) © Danai Koutra (CMU) - SDM'13 22 Intuitiveness P1. Edge Importance P2. Weight Awareness P3. Edge-“Submodularity” P4. Focus Awareness Scalability Targeted changes are more important than random changes of the same extent. GAGA targeted G B’ random G B

23
CMU Duke How do state-of-the-art methods fare? © Danai Koutra (CMU) - SDM'13 23 MetricP1P2P3P4 Vertex/Edge Overlap ✗✗✗ ? Graph Edit Distance (XOR) ✗✗✗ ? Signature Similarity ✗✔✗ ? λ-distance (adjacency matrix) ✗✔✗ ? λ-distance (graph laplacian) ✗✔✗ ? λ-distance (normalized lapl.) ✗✔✗ ? D ELTA C ON 0 ✔✔✔✔ D ELTA C ON ✔✔✔✔ edge weight returns focus

24
CMU Duke Roadmap Intuition Axioms & Properties Proposed Algorithm: D ELTA C ON Experiments Applications Related Work Conclusions © Danai Koutra (CMU) - SDM'13 24

25
CMU Duke Proposed algorithm: D ELTA C ON 0 ①Find the pairwise node influence, S A & S B. © Danai Koutra (CMU) - SDM'13 25 SA =SA = S B = BASE ALGO

26
CMU Duke STEP 1: How to compute node influence? A1: Pagerank A2: Personalized Random Walk with Restart (RWR) A3: Lazy RWR A4: “Electrical network analogy” - resistances A5: Belief Propagation F A BP … © Danai Koutra (CMU) - SDM'13 26

27
CMU Duke STEP 1: Intuition of BP © Danai Koutra (CMU) - SDM'13 27 BACKGROUND iterative message-based method Iteration 1 Iteration 2 0 0 0 e.g., CS person

28
CMU Duke STEP 1: Fast BP (1) 1 d1 d2 d3 d1 d2 d3 1 0 1 1 1 0 0 1 1 1 0 1 1 1 0 0 1 1 ? ? 0 1 0 1 0 © Danai Koutra (CMU) - SDM'13 28 BACKGROUND i th row similar to RWR

29
CMU Duke STEP 1: Fast BP (1) 1 d1 d2 d3 d1 d2 d3 1 0 1 1 1 0 0 1 1 1 0 1 1 1 0 0 1 1 ? ? 0 1 0 1 0 © Danai Koutra (CMU) - SDM'13 29 BACKGROUND i th row similar to RWR strength of influence between neighbors

30
CMU Duke STEP 1: Fast BP (1) 1 d1 d2 d3 d1 d2 d3 1 0 1 1 1 0 0 1 1 1 0 1 1 1 0 0 1 1 ? ? 0 1 0 1 0 © Danai Koutra (CMU) - SDM'13 30 BACKGROUND i th row similar to RWR final influence from node i strength of influence between neighbors

31
CMU Duke STEP 1: Fast BP (2) 1 d1 d2 d3 d1 d2 d3 1 0 1 1 1 0 0 1 1 1 0 1 1 1 0 0 1 1 ? ? 0 1 0 1 0 © Danai Koutra (CMU) - SDM'13 31 i th row 1 0.2 0.1 0.3 1 0.2 0 0.5 1 1 0.2 0.1 0.3 1 0.2 0 0.5 1 OR pairwise influence matrix:

32
CMU Duke STEP 1: Why FaBP? 1)Sound theoretical background (MLE on marginals) 2)Fast: linear on the edges 3)Attenuating Neighboring Influence © Danai Koutra (CMU) - SDM'13 32 DETAILS

33
CMU Duke STEP 1: Why FaBP? 1)Sound theoretical background (MLE on marginals) 2)Fast: linear on the edges 3)Attenuating Neighboring Influence for small ε: © Danai Koutra (CMU) - SDM'13 33 1-hop 2-hops … ε > ε 2 >... 0<ε<1 INTUITION

34
CMU Duke Proposed algorithm: D ELTA C ON 0 ①Find the pairwise influence (F A BP), S A & S B. ②Find distance. © Danai Koutra (CMU) - SDM'13 34 SA,SBSA,SB = Matusita distance SA =SA = S B = BASE ALGO BASE ALGO

35
CMU Duke Proposed algorithm: D ELTA C ON 0 ①Apply F A BP to find the pairwise influence matrices, S A & S B. ②Find distance. ①Find similarity, © Danai Koutra (CMU) - SDM'13 35 SA,SBSA,SB = Matusita distance SA =SA = S B = BASE ALGO BASE ALGO

36
CMU Duke … but O(n 2 ) … © Danai Koutra (CMU) - SDM'13 36 f a s t e r ?

37
CMU Duke Proposed Algorithm: D ELTA C ON – STEP 1 (1) © Danai Koutra (CMU) - SDM'13 37 1a Create g disjoint & covering node groups. 1 4 2 3 A = 4 3 2 1 Adjacency matrix FASTE R ALGO FASTE R ALGO

38
CMU Duke Proposed Algorithm: D ELTA C ON – STEP 1 (2) © Danai Koutra (CMU) - SDM'13 38 1a Create g disjoint & covering node groups. 1b For group i, find node-group influence (F A BP) 1 4 2 3 FASTE R ALGO FASTE R ALGO

39
CMU Duke Proposed Algorithm: D ELTA C ON – STEP 1 (3) © Danai Koutra (CMU) - SDM'13 39 1b e.g., for group 1, find node-group influence (F A BP): S’ A = 12341234 g r o u p s INTUITION SA =SA = 1 2 3 4 row-wise

40
CMU Duke Proposed Algorithm: D ELTA C ON – STEP 1 (4) © Danai Koutra (CMU) - SDM'13 40 1a Create g disjoint & covering node groups. 1b For group i, find node-group influence (F A BP) 1c Create node-group influence matrices, S’ A & S’ B. 1 4 2 3 S’ B = S’ A = 12341234 12341234 g r o u p s FASTE R ALGO FASTE R ALGO

41
CMU Duke Proposed Algorithm: D ELTA C ON (5) © Danai Koutra (CMU) - SDM'13 41 1a Create g disjoint & covering node groups. 1b For group i, find node-group influence (F A BP) 1c Create node-group influence matrices, S’ A & S’ B. 1 4 2 3 FASTE R ALGO FASTE R ALGO S’ B = S’ A = 12341234 12341234 g r o u p s

42
CMU Duke Roadmap Intuition Axioms & Properties Proposed Algorithm: D ELTA C ON Applications ENRON: anomaly detection Brain Graphs: clustering Experiments Conclusions © Danai Koutra (CMU) - SDM'13 42

43
CMU Duke Temporal Anomaly Detection in ENRON (1) © Danai Koutra (CMU) - SDM'13 43 Nodes: employees Edges: email exchange D ELTA C ON similarities of consecutive timestamps Day 1 Day 2 Day 3 Day 4 Day 5 sim 1 sim 2 sim 3 sim 4

44
CMU Duke Temporal Anomaly Detection in ENRON (2) © Danai Koutra (CMU) - SDM'13 44 similarity consecutive days IMR

45
CMU Duke Roadmap Intuition Axioms & Properties Proposed Algorithm: D ELTA C ON Applications ENRON: anomaly detection Brain Graphs: clustering Experiments Related Work Conclusions © Danai Koutra (CMU) - SDM'13 45

46
CMU Duke Brain Connectivity Graph Clustering (1) © Danai Koutra (CMU) - SDM'13 46 114 aligned connectomes (FMRI) Nodes: 70 cortical regions Edges: connections Attributes: gender, IQ, age…

47
CMU Duke Brain Connectivity Graph Clustering (2) © Danai Koutra (CMU) - SDM'13 47 ①pairwise D ELTA C ON similarities ②hierarchical clustering ③t-test / ANOVA for given attributes Ward’s linkage

48
CMU Duke Brain Connectivity Graph Clustering (3) © Danai Koutra (CMU) - SDM'13 48 High CCI Low CCI t-test / ANOVA for given attributes p-value = 0.0057

49
CMU Duke Roadmap Intuition Axioms & Properties Proposed Algorithm: D ELTA C ON Applications Experiments Scalability Conclusions © Danai Koutra (CMU) - SDM'13 49

50
CMU Duke Scalability Dataset: Kronecker graphs D ELTA C ON is linear on the edges + groups; O(g×n + g×(m 1 +m 2 ). Dataset: Kronecker graphs D ELTA C ON is linear on the edges + groups; O(g×n + g×(m 1 +m 2 ). # of edges = max{m 1,m 2 } runtime (min) © Danai Koutra (CMU) - SDM'13 50 SLOPE = 1 # of edges in G A & G B # of nodes

51
CMU Duke Roadmap Intuition Axioms & Properties Proposed Algorithm: D ELTA C ON Applications Experiments Related Work Conclusions © Danai Koutra (CMU) - SDM'13 51

52
CMU Duke State-of-the-art Approaches Vertex/Edge Overlap [Papadimitriou, Dasdan, Garcia-Molina. JISA’10] Graph Edit Distance [Papadimitriou, Dasdan, Garcia-Molina. JISA’10] Signature Similarity (SimHash algorithm) [Papadimitriou, Dasdan, Garcia-Molina. JISA’10] λ-distance [Peabody ’03; Bunke, Dickinson, Kraetzl, Wallis ‘06] … © Danai Koutra (CMU) - SDM'13 52

53
CMU Duke Roadmap Intuition Axioms & Properties Proposed Algorithm: D ELTA C ON Applications Experiments Related Work Conclusions © Danai Koutra (CMU) - SDM'13 53

54
CMU Duke Conclusions Theory Axioms Desired Properties Practice D ELTA C ON algorithm principled intuitive and scalable Real-world applications Experiments on synthetic & real graphs © Danai Koutra (CMU) - SDM'13 54 axioms properties linear on input Temporal anomaly detection + brain scans classification

55
CMU Duke Thank you! © Danai Koutra (CMU) - SDM'13 55

56
CMU Duke Backup slide (1): What if unknown correspondence? Graph matching + then DeltaCon …work in progress… Global Feature Extraction + comparison e.g., λ-distance [Peabody ‘03], [Macindoe & Richards ‘10] Local Feature Extraction + aggregation + comparison [Berlingerio et al. ’12] … © Danai Koutra (CMU) - SDM'13 56

57
CMU Duke Backup slide (2): Bounds Lemma: Lower bound. sim DC0 (G1; G2) ≤sim DC (G1; G2). Conjecture: Upper bound. Johnson-Lindenstrauss lemma © Danai Koutra (CMU) - SDM'13 57

58
CMU Duke Backup slide (3): # of groups - sensitivity © Danai Koutra (CMU) - SDM'13 58

59
CMU Duke Backup slide (5): Datasets Dataset# nodes# edges Synthetic graphs5-104-90 Kronecker graphs6K -1.6M66K – 67.1M Brain Graphs70800-1208 Enron36,692367,662 Epinions131,828841,372 Email EU265,214420,045 Web Google875,7145,105,039 AS Skitter1,696,41511,095,298 © Danai Koutra (CMU) - SDM'13 59

Similar presentations

OK

10-603/15-826A: Multimedia Databases and Data Mining SVD - part II (more case studies) C. Faloutsos.

10-603/15-826A: Multimedia Databases and Data Mining SVD - part II (more case studies) C. Faloutsos.

© 2018 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on bluetooth based smart sensor Doc convert to ppt online templates Ppt on pin diode application Ppt on algebraic expressions and identities for class 8 Ppt on chromosomes and genes share Thin film transistor display ppt on tv Ppt on different forms of power sharing in india Ppt on leadership and change management Appt only salon Ppt on 2 dimensional figures and 3 dimensional slides shoes