Presentation is loading. Please wait.

Presentation is loading. Please wait.

School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10.

Similar presentations


Presentation on theme: "School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10."— Presentation transcript:

1 School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10 December 2013, Dallas, Texas, USA

2 Can we identify users across social networks? 2 Same or “similar” users? Danai Koutra (CMU)

3 More applications? 3 protein-protein alignment chemical compound comparison IR: synonym extraction link prediction & viral marketing Optical character recognition Structure matching in DB wiki translation Danai Koutra (CMU)

4 RoadMap Problem Definition What’s different? BiG-Align Uni-Align Conclusions 4 Danai Koutra (CMU)

5 Problem Definition INPUT: A, B 5 usersusers groups usersusers groups A B Danai Koutra (CMU)

6 Problem Definition INPUT: A, B OUTPUT: P and … (permutation matrices) 6 P (users) A B Danai Koutra (CMU) usersusers groups usersusers groups A B

7 usersusers groups usersusers groups A B Problem Definition INPUT: A, B OUTPUT: P and Q (permutation matrices) s.t. min || PAQ - B|| F 2 7 P (users) A B Q (groups) A B Danai Koutra (CMU) users/groups permutation of A permutation of users/groups in A Graph isomorphism: HARD (P or NP complete?) Subgraph isomorphism: NP-complete And now what? Graph isomorphism: HARD (P or NP complete?) Subgraph isomorphism: NP-complete And now what? constraints / relaxations

8 Problem Definition: constraints INPUT: A, B OUTPUT: P, Q correspondence matrices s.t. min || PAQ - B|| F 2 8 u g … … g … … … u A B P (users) A B Q (groups) A B Danai Koutra (CMU)

9 Problem Definition: constraints INPUT: A, B OUTPUT: P, Q correspondence matrices s.t. min || PAQ - B|| F 2 CONSTRAINTS: (a) P ij, Q ij = probabilities (not 1-1 mapping) (b) sparse matrices P and Q (more efficient for large scale graphs) 9 u g … … g … … … u A B P (users) A B Q (groups) A B Danai Koutra (CMU)

10 RoadMap Problem Definition What’s different? BiG-Align Uni-Align Conclusions 10 Danai Koutra (CMU)

11 What’s different? Focus on bipartite graphs BiG-Align vs. other approaches BiG-Align vs. other approaches 11 Danai Koutra (CMU)

12 What’s different? Focus on bipartite graphs New optimization problem/constraints BiG-Align vs. other approaches BiG-Align vs. other approaches 12 Danai Koutra (CMU)

13 What’s different? Focus on bipartite graphs New optimization problem/constraints The hope is:  the specific graph structure will lead to more accurate graph alignment BiG-Align vs. other approaches BiG-Align vs. other approaches 13 Danai Koutra (CMU)

14 Why bipartite graphs? (1)ubiquitous – e.g., users-files, authors-papers, customers-products, users-msg/groups user-movie rating graphs 14 Danai Koutra (CMU)

15 Why bipartite graphs? (1)ubiquitous – e.g., users-files, authors-papers, customers-products, users-msg/groups user-movie rating graphs (2)coupled alignment: individual & community-level node s communities 15 Danai Koutra (CMU)

16 Why bipartite graphs? (1)ubiquitous – e.g., users-files, authors-papers, customers-products, users-msg/groups user-movie rating graphs (2)coupled alignment: individual & community-level (3)conversion of uni-partite graph to bi-partite --> clustering + (2) node s communities 16 Danai Koutra (CMU)

17 Why bipartite graphs? (1)ubiquitous – e.g., users-files, authors-papers, customers-products, users-msg/groups user-movie rating graphs (2)coupled alignment: individual & community-level (3)conversion of unipartite graph to bipartite --> clustering + (2) (4)general formulation: (a)match clouds of points (point-feature graph) (b)tensors (e.g. time-evolving, or other 3 rd dimension) 17 users time Danai Koutra (CMU) node s communities

18 RoadMap Problem Definition What’s different? BiG-Align Uni-Align Conclusions 18 Danai Koutra (CMU)

19 BiG-Align: algorithm initialize(P) initialize(Q) P k+1 = P k – ηP df(P k, Q k )/dP valid_projection(P k+1 ) Q k+1 = Q k – ηQ df(P k+1, Q k )/dQ valid_projection(Q k+1 ) update(ηP, ηQ) 19 DETAILSDETAILS until convergence Danai Koutra (CMU) alternating, projected gradient descent

20 BiG-Align: algorithm 20 DETAILSDETAILS until convergence Danai Koutra (CMU) initialize(P) initialize(Q) P k+1 = P k – ηP df(P k, Q k )/dP valid_projection(P k+1 ) Q k+1 = Q k – ηQ df(P k+1, Q k )/dQ valid_projection(Q k+1 ) update(ηP, ηQ) Probabilistic Constraint

21 BiG-Align: algorithm initialize(P) initialize(Q) P k+1 = P k – ηP df(P k, Q k )/dP valid_projection(P k+1 ) Q k+1 = Q k – ηQ df(P k+1, Q k )/dQ valid_projection(Q k+1 ) update(ηP, ηQ) 21 DETAILSDETAILS until convergence Danai Koutra (CMU) Sparsity Constraint

22 BiG-Align: algorithm initialize(P) initialize(Q) P k+1 = P k – ηP df(P k, Q k )/dP valid_projection(P k+1 ) Q k+1 = Q k – ηQ df(P k+1, Q k )/dQ valid_projection(Q k+1 ) update(ηP, ηQ) 22 DETAILSDETAILS until convergence Danai Koutra (CMU) Sparsity Constraint min f = min||| PAQ – B|| F 2 + λ Σ P ij + μ Σ Q ij

23 RoadMap Problem Definition What’s different? BiG-Align  Optimizations Uni-Align Conclusions 23 Danai Koutra (CMU)

24 BiG-Align: Optimizations initialize(P) initialize(Q) P k+1 = P k – ηP df(P k, Q k )/dP valid_projection(P k+1 ) Q k+1 = Q k – ηQ df(P k+1, Q k )/dQ valid_projection(Q k+1 ) update(ηP, ηQ) 24 D ETAILS until convergence Danai Koutra (CMU) alternating, projected gradient descent

25 BiG-Align: Optimizations initialize(P) initialize(Q) P k+1 = P k – ηP df(P k, Q k )/dP valid_projection(P k+1 ) Q k+1 = Q k – ηQ df(P k+1, Q k )/dQ valid_projection(Q k+1 ) update(ηP, ηQ) 25 D ETAILS until convergence Danai Koutra (CMU) alternating, projected gradient descent alternating, projected gradient descent

26 Optimization 1: Structurally equivalent nodes 26 D ETAILS Aggregation to super-nodes Danai Koutra (CMU) Graph A

27 BiG-Align: Optimizations initialize(P) initialize(Q) P k+1 = P k – ηP df(P k, Q k )/dP valid_projection(P k+1 ) Q k+1 = Q k – ηQ df(P k+1, Q k )/dQ valid_projection(Q k+1 ) update(ηP, ηQ) 27 D ETAILS until convergence Danai Koutra (CMU) alternating, projected gradient descent alternating, projected gradient descent

28 Optimization 2: Initialization of P and Q 28 D ETAILS Why is the initialization important? Danai Koutra (CMU) global minimum local minima …

29 Social networks are structured: the degree distribution is power-law like. Optimization 2: Initialization of P and Q 29 D ETAILS ranked nodes log(degree) Danai Koutra (CMU)

30 Optimization 2: Initialization of P and Q 30 D ETAILS Network-inspired initialization cluster 1 cluster 2 cluster n cluster 2 cluster n k k user degrees of G A user degrees in G B … … … … … … … … … 1 P 11-1 matching of top k nodes  1-1 matching of clusters of degrees Danai Koutra (CMU) cluster 1 degree rank of node knee k

31 BiG-Align: Optimizations initialize(P) initialize(Q) P k+1 = P k – ηP df(P k, Q k )/dP valid_projection(P k+1 ) Q k+1 = Q k – ηQ df(P k+1, Q k )/dQ valid_projection(Q k+1 ) update(ηP, ηQ) 31 D ETAILS until convergence Danai Koutra (CMU) alternating, projected gradient descent alternating, projected gradient descent

32 Optimization 3: Steps of gradient descent 32 D ETAILS Constant step: thrashing or slow convergence Danai Koutra (CMU)

33 Optimization 3: Steps of gradient descent 33 D ETAILS Variable step with line search: strategy for local optimum Danai Koutra (CMU) η P = argmin f(η P ) = g 1 (P,Q,A,B) η Q = argmin f(η Q ) = g 2 (P,Q,A,B) closed formulas

34 Variable step with line search: strategy for local optimum BiG-Align-Exact: computes the steps at every iteration Optimization 3: Steps of gradient descent 34 D ETAILS Danai Koutra (CMU) η P = argmin f(η P ) = g 1 (P,Q,A,B) η Q = argmin f(η Q ) = g 2 (P,Q,A,B) closed formulas

35 Optimization 3: Steps of gradient descent 35 D ETAILS But Danai Koutra (CMU) step size (η) iterations Slow change in the steps

36 Optimization 3: Steps of gradient descent 36 D ETAILS But BiG-Align-Skip: compute η’s every m (=500) iterations Danai Koutra (CMU) step size (η) iterations Slow change in the steps

37 RoadMap Problem Definition What’s different? BiG-Align  Experiments Uni-Align Conclusions 37 Danai Koutra (CMU)

38 Experimental Setup 38 Implementation: Matlab Dataset: IMDB movie-genre graph and subgraphs (1027 movies x 27 genres) Setup:  random permutations  noise level: % Danai Koutra (CMU) Ground truth Simulate real-world applications

39 State-of-the-art 39 ①Umeyama’s algorithm [Umeyama88]: SVD-based ①NMF-based approach [Ding+08]: Builds on top of Umeyama’s approach ①Net-Align [Bayati+09] Belief Propagation B ACKGROUND Danai Koutra (CMU)

40 State-of-the-art 40 ①Umeyama’s algorithm [Umeyama88]: SVD-based ①NMF-based approach [Ding+08]: Builds on top of Umeyama’s approach ①Net-Align [Bayati+09] Belief Propagation B ACKGROUND Danai Koutra (CMU) Bi-partite  Uni-partite

41 Big-Align: Accuracy vs. Runtime 41 Danai Koutra (CMU) marker size related to graph size Umeyama NetAlign NMF-based BiG-Align skip BiG-Align exact

42 Big-Align: Accuracy vs. Runtime 42 Danai Koutra (CMU) Big-Align improves both speed and accuracy. Umeyama NetAlign NMF-based BiG-Align skip BiG-Align exact

43 Big-Align: Accuracy w.r.t. noise 43 Danai Koutra (CMU) BiG-Align-exact BiG-Align-skip NMF-basedNetAlign-deg NetAlign-full Umeyama

44 Big-Align: Accuracy w.r.t. noise 44 Danai Koutra (CMU) BiG-Align improves the accuracy for almost all levels of noise. BiG-Align-exact BiG-Align-skip NMF-basedNetAlign-deg NetAlign-full Umeyama

45 RoadMap Problem Definition What’s different? BiG-Align Uni-Align Conclusions 45 Danai Koutra (CMU)

46 Algorithm: Uni-Align 46 D ETAILS Danai Koutra (CMU) n nodes d features node degree clustering coeff … … min || PAQ - B|| F 2 fixed P

47 Algorithm: Uni-Align 47 D ETAILS Danai Koutra (CMU) n nodes d features min || PAQ - B|| F 2 P P = g*(A,B,S,U)= = closed-form solution SVD A = USV T O(n. d 2 )

48 RoadMap Problem Definition What’s different? BiG-Align Uni-Align  Experiments Conclusions 48 Danai Koutra (CMU)

49 Uni-Align 49 Danai Koutra (CMU) Dataset: Facebook friendship graph (64K users) Setup: uni-partite  bi-partite graph  Feature extraction  node degree  egonet degree  edges in egonet  mean degree of node’s neighbors egonet

50 Uni-Align: Accuracy vs. Runtime 50 Danai Koutra (CMU) Uni-Align, followed by Net-Align, is more accurate and faster than other approaches. NMF-based NetAlign Umeyama Uni-Align

51 Uni-Align: Runtime 51 Danai Koutra (CMU) Uni-Align is 2x - 31,700x faster depending on graph size. Umeyama Uni-Align NMF-based NetAlign-deg NetAlign-full

52 RoadMap Problem Definition What’s different? BiG-Align Uni-Align Conclusions 52 Danai Koutra (CMU)

53 Conclusions 53 Formulation: new problem / constraints Danai Koutra (CMU)

54 Conclusions 54 Formulation: new problem / constraints Algorithms:  BiG-Align: optimized alternating projected gradient descent  Uni-Align: alignment for uni-partite graphs Danai Koutra (CMU)

55 Conclusions 55 Formulation: new problem / constraints Algorithms:  BiG-Align: optimized alternating projected gradient descent  Uni-Align: alignment for uni-partite graphs Evaluations: more accurate and efficient Danai Koutra (CMU)

56 Beyond BiG-Align: Multi-way Linkage 56 Danai Koutra (CMU) ~ (1)All build upon BiG-Align (2) Led to 7 patents –~–~ S1: Dynamic Graph Linkage –~–~ S2: Community-level Linkage S3: Hetero. Graph Linkage S4: Multi-relational DB Linkage

57 Thank you! 57 Danai Koutra (CMU)


Download ppt "School of Computer Science Carnegie Mellon University BiG-Align: Fast Bipartite Graph Alignment Danai Koutra Hanghang Tong David Lubensky IEEE ICDM, 7-10."

Similar presentations


Ads by Google