Presentation is loading. Please wait.

Presentation is loading. Please wait.

Endend endend Carnegie Mellon University Korea Advanced Institute of Science and Technology VoG: Summarizing and Understanding Large Graphs Danai Koutra.

Similar presentations


Presentation on theme: "Endend endend Carnegie Mellon University Korea Advanced Institute of Science and Technology VoG: Summarizing and Understanding Large Graphs Danai Koutra."— Presentation transcript:

1 endend endend Carnegie Mellon University Korea Advanced Institute of Science and Technology VoG: Summarizing and Understanding Large Graphs Danai Koutra Jilles Vreeken U Kang Christos Faloutsos SDM, 24-26 April 2014, Philadelphia, USA © 2014, Danai Koutra

2 endend endend Problem Definition: Graph Summarization © Danai Koutra – SDM’142 Given: a graph Find: ≈ important graph structures. a succinct summary with possibly overlapping subgraphs

3 endend endend Why graph summarization? Visualization Guiding attention © Danai Koutra – SDM’143

4 endend endend Why graph summarization? © Danai Koutra – SDM’144 Graph Understanding

5 endend endend Application: Wikipedia controversy © Danai Koutra – SDM’145 Nodes: wiki editors Edges: co-edited I don’t see anything! 

6 endend endend Application: Wikipedia controversy © Danai Koutra – SDM’146 Stars: admins, bots, heavy users Bipartite cores: edit wars Nodes: wiki editors Edges: co-edited Kiev vs. Kyiv vandals

7 endend endend Roadmap Main Idea Proposed Algorithm: V O G V O G: Step-by-Step Experiments Conclusions © Danai Koutra – SDM’147

8 endend endend Main Idea 1)Use a graph vocabulary: 2)Best graph summary  optimal compression (MDL) © Danai Koutra – SDM’148 Shortest lossless description

9 endend endend Minimum Description Length Principle © Danai Koutra – SDM’149 B ACKGROUND Given a set of models M, the best model M ε M is argmin L(M) + L(D|M) M # bits for M # bits for the data using M M M

10 endend endend Minimum Description Length Principle © Danai Koutra – SDM’1410 E XAMPLE a 1 x + a 0 L(M) + L(D|M) a 10 x 10 + a 9 x 9 + … + a 0 errors { } [Example adapted from Akoglu’12]

11 endend endend Formally: Minimum Graph Description © Danai Koutra – SDM’1411 Given: - a graph G with adjacency matrix A - vocabulary Ω Find: model M s.t. min L(G,M) = min L(M) + L(E) Model M Adjacency A Error E

12 endend endend Roadmap Main Idea Proposed Algorithm: V O G V O G: Step-by-Step Experiments Conclusions © Danai Koutra – SDM’1412

13 endend endend VoG: Overview © Danai Koutra – SDM’1413 argmin ≈ ≈?

14 endend endend VoG: Overview © Danai Koutra – SDM’1414 some criterion Summary

15 endend endend Roadmap Main Idea Proposed Algorithm: V O G V O G: Step-by-Step Experiments Conclusions © Danai Koutra – SDM’1415

16 endend endend © Danai Koutra – SDM’1416 … How can we get them? We need candidate structures…

17 endend endend © Danai Koutra – SDM’1417 Could use: ANY graph decomposition method We adapted a node reordering method: SlashBurn Step 1: Graph Decomposition

18 endend endend SlashBurn-based Graph Decomposition © Danai Koutra – SDM’1418 Slash top-k hubs, burn edges Repeat on the remaining GCC [SlashBurn: U Kang and Christos Faloutsos. ICDM’11] candidate structures BeforeAfter GCC Notice that the structures can overlap!

19 endend endend © Danai Koutra – SDM’1419 Now, how can we ‘label’ them? We got candidate structures.

20 endend endend Step 2: Graph Labeling © Danai Koutra – SDM’1420 ≈? argmin ≈ 1 2

21 endend endend Graph Representation © Danai Koutra – SDM’1421 hub? “best” node split? 45 80 n “best” node ordering? 1 1 n... missing edges? DETAILS

22 endend endend Graph Representation © Danai Koutra – SDM’1422 hub Hub: top-deg node Spokes: the rest Hub: top-deg node Spokes: the rest L N (|st|−1) + logn + log( ) + L(E+ ) + L(E− ) # of spokes hub ID n−1 |st|−1 spokes IDs extra missing Errors Star structure 6 n=7 DETAILS

23 endend endend Graph Representation © Danai Koutra – SDM’1423 Max bipartite graph: NP-hard Heuristic: Belief Propagation with heterophily for node classification (blue/red) Max bipartite graph: NP-hard Heuristic: Belief Propagation with heterophily for node classification (blue/red) + logn + log( ) + L(E+ ) + L(E− ) # of blue nodes n−1 |st|−1 their IDs extra missing Errors # of red nodes Bipartite graph structure DETAILS

24 endend endend Graph Representation © Danai Koutra – SDM’1424 1 45 80 n 1 n... Longest path: NP-hard Heuristic: BFS + local search Longest path: NP-hard Heuristic: BFS + local search + + extra missing Errors Chain structure DETAILS

25 endend endend Step 2: Graph Labeling © Danai Koutra – SDM’1425 ≈? argmin ≈

26 endend endend Step 3: Summary Assembly © Danai Koutra – SDM’1426 Summary

27 endend endend Concepts © Danai Koutra – SDM’1427 = # bits as structure - # bits as noise compression gain Savings DETAILS

28 endend endend Step 3: Summary Assembly © Danai Koutra – SDM’1428 Summary

29 endend endend Concepts © Danai Koutra – SDM’1429 Summary Encoding cost L( M ) = L N ( |M|+1 ) + log ( ) + Σ ( -logP ( x(s)|M ) + L(s) ) |M|+1 |Ω|+1 s # of structures per type for each structure its encoding length its connectivity its type 3 # of structures per type for each structure its encoding length : 1 DETAILS

30 endend endend Step 3: Summary Assembly © Danai Koutra – SDM’1430 L(M) structures … DETAILS

31 endend endend Roadmap Main Idea Encoding Schema Proposed Algorithm: V O G Experiments Conclusions © Danai Koutra – SDM’1431

32 endend endend Application: Enron © Danai Koutra – SDM’1432 Top-3 Stars klay kenneth.lay @enron.com Top-1 NBC Ski excursion

33 endend endend Runtime © Danai Koutra – SDM’1433 VOG is near-linear on the number of edges of the input graph.

34 endend endend Roadmap Main Idea Encoding Schema Proposed Algorithm: V O G Experiments Conclusions © Danai Koutra – SDM’1434

35 endend endend Conclusions 35 Formulation: info-theoretic graph summarization approach Algorithm: VoG is near-linear on the edges Experiments on real graphs © Danai Koutra - SDM’14

36 endend endend © Danai Koutra – SDM’1436 Code www.cs.cmu.edu/~dkoutra/SRC/vog.tar

37 endend endend Thank you! Questions? www.cs.cmu.edu/~dkoutra/pub.htm danai@cs.cmu.edu © Danai Koutra - SDM’1437


Download ppt "Endend endend Carnegie Mellon University Korea Advanced Institute of Science and Technology VoG: Summarizing and Understanding Large Graphs Danai Koutra."

Similar presentations


Ads by Google