Endend endend Carnegie Mellon University Korea Advanced Institute of Science and Technology VoG: Summarizing and Understanding Large Graphs Danai Koutra.

Slides:



Advertisements
Similar presentations
Social network partition Presenter: Xiaofei Cao Partick Berg.
Advertisements

BiG-Align: Fast Bipartite Graph Alignment
CMU SCS I2.2 Large Scale Information Network Processing INARC 1 Overview Goal: scalable algorithms to find patterns and anomalies on graphs 1. Mining Large.
ECE Longest Path dual 1 ECE 665 Spring 2005 ECE 665 Spring 2005 Computer Algorithms with Applications to VLSI CAD Linear Programming Duality – Longest.
FUNNEL: Automatic Mining of Spatially Coevolving Epidemics Yasuko Matsubara, Yasushi Sakurai (Kumamoto University) Willem G. van Panhuis (University of.
School of Computer Science Carnegie Mellon University Duke University DeltaCon: A Principled Massive- Graph Similarity Function Danai Koutra Joshua T.
School of Computer Science Carnegie Mellon University National Taiwan University of Science & Technology Unifying Guilt-by-Association Approaches: Theorems.
DIJKSTRA’s Algorithm. Definition fwd search Find the shortest paths from a given SOURCE node to ALL other nodes, by developing the paths in order of increasing.
CS 206 Introduction to Computer Science II 03 / 27 / 2009 Instructor: Michael Eckmann.
© 2012 IBM Corporation IBM Research Gelling, and Melting, Large Graphs by Edge Manipulation Joint Work by Hanghang Tong (IBM) B. Aditya Prakash (Virginia.
 Graph Graph  Types of Graphs Types of Graphs  Data Structures to Store Graphs Data Structures to Store Graphs  Graph Definitions Graph Definitions.
DATA MINING LECTURE 7 Minimum Description Length Principle Information Theory Co-Clustering.
Node labels as random variables prior belief observed neighbor potentials compatibility potentials Opinion Fraud Detection in Online Reviews using Network.
Minimizing Multi-Hop Wireless Routing State under Application- based Accuracy Constraints Mustafa Kilavuz & Murat Yuksel University of Nevada, Reno.
Lectures on Network Flows
CMU SCS C. Faloutsos (CMU)#1 Large Graph Algorithms Christos Faloutsos CMU McGlohon, Mary Prakash, Aditya Tong, Hanghang Tsourakakis, Babis Akoglu, Leman.
Network Correlated Data Gathering With Explicit Communication: NP- Completeness and Algorithms R˘azvan Cristescu, Member, IEEE, Baltasar Beferull-Lozano,
Lecture 21: Spectral Clustering
CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU.
Chapter 9 Graph algorithms. Sample Graph Problems Path problems. Connectedness problems. Spanning tree problems.
Fully Automatic Cross-Associations Deepayan Chakrabarti (CMU) Spiros Papadimitriou (CMU) Dharmendra Modha (IBM) Christos Faloutsos (CMU and IBM)
Chapter 9: Greedy Algorithms The Design and Analysis of Algorithms.
Graph-Based Concept Learning Jesus A. Gonzalez, Lawrence B. Holder, and Diane J. Cook Department of Computer Science and Engineering University of Texas.
Structural Knowledge Discovery Used to Analyze Earthquake Activity Jesus A. Gonzalez Lawrence B. Holder Diane J. Cook.
Image Segmentation Today’s Readings Intelligent Scissors, Mortensen et. al, SIGGRAPH 1995Intelligent Scissors From Sandlot ScienceSandlot Science.
SCS CMU Proximity Tracking on Time- Evolving Bipartite Graphs Speaker: Hanghang Tong Joint Work with Spiros Papadimitriou, Philip S. Yu, Christos Faloutsos.
Chapter 9 Graph algorithms Lec 21 Dec 1, Sample Graph Problems Path problems. Connectedness problems. Spanning tree problems.
Constrained Floorplanning Using Network Flows Teng Wang 05/04/2004.
Fast Random Walk with Restart and Its Applications
CS 206 Introduction to Computer Science II 03 / 30 / 2009 Instructor: Michael Eckmann.
Data Structures, Spring 2004 © L. Joskowicz 1 DAST – Final Lecture Summary and overview What we have learned. Why it is important. What next.
Genome Assembly Charles Yan Fragment Assembly Given a large number of fragments, such as ACC AC AT AC AT GG …, the goal is to figure out the original.
The Shortest Path Problem
Introduction to Routing. The Routing Problem Apply after placement Input: –Netlist –Timing budget for, typically, critical nets –Locations of blocks and.
Authors: Bhavana Bharat Dalvi, Meghana Kshirsagar, S. Sudarshan Presented By: Aruna Keyword Search on External Memory Data Graphs.
School of Computer Science Carnegie Mellon University National Taiwan University of Science & Technology Unifying Guilt-by-Association Approaches: Theorems.
Spotting Culprits in Epidemics: How many and Which ones? B. Aditya Prakash Virginia Tech Jilles Vreeken University of Antwerp Christos Faloutsos Carnegie.
Fast Random Walk with Restart and Its Applications Hanghang Tong, Christos Faloutsos and Jia-Yu (Tim) Pan ICDM 2006 Dec , HongKong.
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
Greedy Algorithms for the Shortest Common Superstring Overview by Anton Nesterov Saint Petersburg State University Russia Original paper by A. Frieze,
ECML-PKDD 2010, Barcelona, Spain B. Aditya Prakash*, Hanghang Tong* ^, Nicholas Valler+, Michalis Faloutsos+, Christos Faloutsos* * Carnegie Mellon University,
Are All Brains Wired Equally Danai Koutra Yu GongJoshua VogelsteinChristos Faloutsos Motivation Connectomics -- creation of brain connectivity maps. Analysing.
Bahareh Sarrafzadeh 6111 Fall 2009
Topic 12 Graphs 1. Graphs Definition: Two types:
Single-Pass Belief Propagation
Homework - hints Problem 1. Node weights  Edge weights
EMIS 8373: Integer Programming Column Generation updated 12 April 2005.
Kijung Shin Jinhong Jung Lee Sael U Kang
Graph Theory. undirected graph node: a, b, c, d, e, f edge: (a, b), (a, c), (b, c), (b, e), (c, d), (c, f), (d, e), (d, f), (e, f) subgraph.
Bo Zong, Yinghui Wu, Ambuj K. Singh, Xifeng Yan 1 Inferring the Underlying Structure of Information Cascades
Lecture 20. Graphs and network models 1. Recap Binary search tree is a special binary tree which is designed to make the search of elements or keys in.
Announcements CS accounts Project 1 is out today
Source Code for Data Structures and Algorithm Analysis in C (Second Edition) – by Weiss
Lectures on Network Flows
Kijung Shin1 Mohammad Hammoud1
Summarizing Entities: A Survey Report
Algorithms for Exploration
Overview: Fault Diagnosis
Near-Optimal (Euclidean) Metric Compression
Jinhong Jung, Woojung Jin, Lee Sael, U Kang, ICDM ‘16
Graph and Tensor Mining for fun and profit
Algorithms and data structures
Bidirectional Query Planning Algorithm
Announcements Project 1 is out today
Automatic Segmentation of Data Sequences
Alan Kuhnle*, Victoria G. Crawford, and My T. Thai
Announcements Project 1 is out today help session at the end of class.
Binhai Zhu Computer Science Department, Montana State University
Presentation transcript:

endend endend Carnegie Mellon University Korea Advanced Institute of Science and Technology VoG: Summarizing and Understanding Large Graphs Danai Koutra Jilles Vreeken U Kang Christos Faloutsos SDM, April 2014, Philadelphia, USA © 2014, Danai Koutra

endend endend Problem Definition: Graph Summarization © Danai Koutra – SDM’142 Given: a graph Find: ≈ important graph structures. a succinct summary with possibly overlapping subgraphs

endend endend Why graph summarization? Visualization Guiding attention © Danai Koutra – SDM’143

endend endend Why graph summarization? © Danai Koutra – SDM’144 Graph Understanding

endend endend Application: Wikipedia controversy © Danai Koutra – SDM’145 Nodes: wiki editors Edges: co-edited I don’t see anything! 

endend endend Application: Wikipedia controversy © Danai Koutra – SDM’146 Stars: admins, bots, heavy users Bipartite cores: edit wars Nodes: wiki editors Edges: co-edited Kiev vs. Kyiv vandals

endend endend Roadmap Main Idea Proposed Algorithm: V O G V O G: Step-by-Step Experiments Conclusions © Danai Koutra – SDM’147

endend endend Main Idea 1)Use a graph vocabulary: 2)Best graph summary  optimal compression (MDL) © Danai Koutra – SDM’148 Shortest lossless description

endend endend Minimum Description Length Principle © Danai Koutra – SDM’149 B ACKGROUND Given a set of models M, the best model M ε M is argmin L(M) + L(D|M) M # bits for M # bits for the data using M M M

endend endend Minimum Description Length Principle © Danai Koutra – SDM’1410 E XAMPLE a 1 x + a 0 L(M) + L(D|M) a 10 x 10 + a 9 x 9 + … + a 0 errors { } [Example adapted from Akoglu’12]

endend endend Formally: Minimum Graph Description © Danai Koutra – SDM’1411 Given: - a graph G with adjacency matrix A - vocabulary Ω Find: model M s.t. min L(G,M) = min L(M) + L(E) Model M Adjacency A Error E

endend endend Roadmap Main Idea Proposed Algorithm: V O G V O G: Step-by-Step Experiments Conclusions © Danai Koutra – SDM’1412

endend endend VoG: Overview © Danai Koutra – SDM’1413 argmin ≈ ≈?

endend endend VoG: Overview © Danai Koutra – SDM’1414 some criterion Summary

endend endend Roadmap Main Idea Proposed Algorithm: V O G V O G: Step-by-Step Experiments Conclusions © Danai Koutra – SDM’1415

endend endend © Danai Koutra – SDM’1416 … How can we get them? We need candidate structures…

endend endend © Danai Koutra – SDM’1417 Could use: ANY graph decomposition method We adapted a node reordering method: SlashBurn Step 1: Graph Decomposition

endend endend SlashBurn-based Graph Decomposition © Danai Koutra – SDM’1418 Slash top-k hubs, burn edges Repeat on the remaining GCC [SlashBurn: U Kang and Christos Faloutsos. ICDM’11] candidate structures BeforeAfter GCC Notice that the structures can overlap!

endend endend © Danai Koutra – SDM’1419 Now, how can we ‘label’ them? We got candidate structures.

endend endend Step 2: Graph Labeling © Danai Koutra – SDM’1420 ≈? argmin ≈ 1 2

endend endend Graph Representation © Danai Koutra – SDM’1421 hub? “best” node split? n “best” node ordering? 1 1 n... missing edges? DETAILS

endend endend Graph Representation © Danai Koutra – SDM’1422 hub Hub: top-deg node Spokes: the rest Hub: top-deg node Spokes: the rest L N (|st|−1) + logn + log( ) + L(E+ ) + L(E− ) # of spokes hub ID n−1 |st|−1 spokes IDs extra missing Errors Star structure 6 n=7 DETAILS

endend endend Graph Representation © Danai Koutra – SDM’1423 Max bipartite graph: NP-hard Heuristic: Belief Propagation with heterophily for node classification (blue/red) Max bipartite graph: NP-hard Heuristic: Belief Propagation with heterophily for node classification (blue/red) + logn + log( ) + L(E+ ) + L(E− ) # of blue nodes n−1 |st|−1 their IDs extra missing Errors # of red nodes Bipartite graph structure DETAILS

endend endend Graph Representation © Danai Koutra – SDM’ n 1 n... Longest path: NP-hard Heuristic: BFS + local search Longest path: NP-hard Heuristic: BFS + local search + + extra missing Errors Chain structure DETAILS

endend endend Step 2: Graph Labeling © Danai Koutra – SDM’1425 ≈? argmin ≈

endend endend Step 3: Summary Assembly © Danai Koutra – SDM’1426 Summary

endend endend Concepts © Danai Koutra – SDM’1427 = # bits as structure - # bits as noise compression gain Savings DETAILS

endend endend Step 3: Summary Assembly © Danai Koutra – SDM’1428 Summary

endend endend Concepts © Danai Koutra – SDM’1429 Summary Encoding cost L( M ) = L N ( |M|+1 ) + log ( ) + Σ ( -logP ( x(s)|M ) + L(s) ) |M|+1 |Ω|+1 s # of structures per type for each structure its encoding length its connectivity its type 3 # of structures per type for each structure its encoding length : 1 DETAILS

endend endend Step 3: Summary Assembly © Danai Koutra – SDM’1430 L(M) structures … DETAILS

endend endend Roadmap Main Idea Encoding Schema Proposed Algorithm: V O G Experiments Conclusions © Danai Koutra – SDM’1431

endend endend Application: Enron © Danai Koutra – SDM’1432 Top-3 Stars klay Top-1 NBC Ski excursion

endend endend Runtime © Danai Koutra – SDM’1433 VOG is near-linear on the number of edges of the input graph.

endend endend Roadmap Main Idea Encoding Schema Proposed Algorithm: V O G Experiments Conclusions © Danai Koutra – SDM’1434

endend endend Conclusions 35 Formulation: info-theoretic graph summarization approach Algorithm: VoG is near-linear on the edges Experiments on real graphs © Danai Koutra - SDM’14

endend endend © Danai Koutra – SDM’1436 Code

endend endend Thank you! Questions? © Danai Koutra - SDM’1437