NetMine: Mining Tools for Large Graphs Deepayan Chakrabarti Yiping Zhan Daniel Blandford Christos Faloutsos Guy Blelloch.

Slides:



Advertisements
Similar presentations
1 Generating Network Topologies That Obey Power LawsPalmer/Steffan Carnegie Mellon Generating Network Topologies That Obey Power Laws Christopher R. Palmer.
Advertisements

1 Dynamics of Real-world Networks Jure Leskovec Machine Learning Department Carnegie Mellon University
CMU SCS Large Graph Mining - Patterns, Explanations and Cascade Analysis Christos Faloutsos CMU.
Power Laws By Cameron Megaw 3/11/2013. What is a Power Law?
Analysis and Modeling of Social Networks Foudalis Ilias.
CMU SCS Copyright: C. Faloutsos (2012)# : Multimedia Databases and Data Mining Lecture #27: Graph mining - Communities and a paradox Christos.
The Connectivity and Fault-Tolerance of the Internet Topology
Lecture 21 Network evolution Slides are modified from Jurij Leskovec, Jon Kleinberg and Christos Faloutsos.
Kronecker Graphs: An Approach to Modeling Networks Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, Zoubin Ghahramani Presented.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
On Power-Law Relationships of the Internet Topology Michalis Faloutsos Petros Faloutsos Christos Faloutsos.
1 Modularity and Community Structure in Networks* Final project *Based on a paper by M.E.J Newman in PNAS 2006.
Emergence of Scaling in Random Networks Barabasi & Albert Science, 1999 Routing map of the internet
CMU SCS Copyright: C. Faloutsos (2014)# : Multimedia Databases and Data Mining Lecture #29: Graph mining - Generators & tools Christos Faloutsos.
Mining and Searching Massive Graphs (Networks)
CMU SCS C. Faloutsos (CMU)#1 Large Graph Algorithms Christos Faloutsos CMU McGlohon, Mary Prakash, Aditya Tong, Hanghang Tsourakakis, Babis Akoglu, Leman.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Lecture 21: Spectral Clustering
Spectral Clustering 指導教授 : 王聖智 S. J. Wang 學生 : 羅介暐 Jie-Wei Luo.
Social Networks and Graph Mining Christos Faloutsos CMU - MLD.
Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun.
© 2011 IBM Corporation IBM Research SIAM-DM 2011, Mesa AZ, USA, Non-Negative Residual Matrix Factorization w/ Application to Graph Anomaly Detection Hanghang.
Sampling from Large Graphs. Motivation Our purpose is to analyze and model social networks –An online social network graph is composed of millions of.
On Power-Law Relationships of the Internet Topology CSCI 780, Fall 2005.
Analysis of the Internet Topology Michalis Faloutsos, U.C. Riverside (PI) Christos Faloutsos, CMU (sub- contract, co-PI) DARPA NMS, no
Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions.
CMU SCS Graph and stream mining Christos Faloutsos CMU.
A scalable multilevel algorithm for community structure detection
1 AutoPart: Parameter-Free Graph Partitioning and Outlier Detection Deepayan Chakrabarti
Fast Isocontouring For Improved Interactivity Chandrajit L. Bajaj Valerio Pascucci Daniel R. Schikore.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 7 May 14, 2006
Application of Graph Theory to OO Software Engineering Alexander Chatzigeorgiou, Nikolaos Tsantalis, George Stephanides Department of Applied Informatics.
Doubling Dimension in Real-World Graphs Melitta Lorraine Geistdoerfer Andersen.
CS8803-NS Network Science Fall 2013
Network Measures Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Measures Klout.
On Power-Law Relationships of the Internet Topology.
CMU SCS Large Graph Mining - Patterns, Tools and Cascade Analysis Christos Faloutsos CMU.
Alan Frieze Charalampos (Babis) E. Tsourakakis WAW June ‘12 WAW '121.
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
Graph and Topological Structure Mining on Scientific Articles Fan Wang, Ruoming Jin, Gagan Agrawal and Helen Piontkivska The Ohio State University The.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P0-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
Data Analysis in YouTube. Introduction Social network + a video sharing media – Potential environment to propagate an influence. Friendship network and.
COM1721: Freshman Honors Seminar A Random Walk Through Computing Lecture 2: Structure of the Web October 1, 2002.
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos.
Semantic Wordfication of Document Collections Presenter: Yingyu Wu.
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
Most of contents are provided by the website Graph Essentials TJTSD66: Advanced Topics in Social Media.
R-MAT: A Recursive Model for Graph Mining Deepayan Chakrabarti Yiping Zhan Christos Faloutsos.
Data Structures and Algorithms in Parallel Computing Lecture 7.
RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School.
Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P9-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
Cmpe 588- Modeling of Internet Emergence of Scale-Free Network with Chaotic Units Pulin Gong, Cees van Leeuwen by Oya Ünlü Instructor: Haluk Bingöl.
EE327 Final Representation Qianyang Peng F
Cohesive Subgraph Computation over Large Graphs
PEGASUS: A PETA-SCALE GRAPH MINING SYSTEM
NetMine: Mining Tools for Large Graphs
Community detection in graphs
Graph and Tensor Mining for fun and profit
CS120 Graphs.
Large Graph Mining: Power Tools and a Practitioner’s guide
15-826: Multimedia Databases and Data Mining
Part 1: Graph Mining – patterns
Lecture 13 Network evolution
R-MAT: A Recursive Model for Graph Mining
Graph and Tensor Mining for fun and profit
Graph and Tensor Mining for fun and profit
Lecture 21 Network evolution
Modelling and Searching Networks Lecture 2 – Complex Networks
Presentation transcript:

NetMine: Mining Tools for Large Graphs Deepayan Chakrabarti Yiping Zhan Daniel Blandford Christos Faloutsos Guy Blelloch

2 Introduction Internet Map [lumeta.com] Food Web [Martinez ’91] Protein Interactions [genomebiology.com] Friendship Network [Moody ’01] ► Graphs are ubiquitous

3 Graph “Patterns” How does a real-world graph look like? Patterns/“Laws”  Degree distributions/power-laws Count vs Outdegree Power Laws

4 Graph “Patterns” How does a real-world graph look like? Patterns/“Laws”  Degree distributions/power-laws  “Small-world”  “Scree” plots  and others… Hop-plot Effective Diameter

5 Graph “Patterns” Why do we like them?  They capture interesting properties of graphs.  They provide “condensed information” about the graph.  They are needed to build/test realistic graph generators (useful for simulation studies extrapolations/sampling )  They help detect abnormalities and outliers.

6 Graph Patterns Degree-distributions  “power laws” “Small-world”  small diameter “Scree” plots … What else?

7 Our Work The NetMine toolkit  contains all the patterns mentioned before, and adds: The “min-cut” plot  a novel pattern which carries interesting information about the graph. A-plots  a tool to quickly find suspicious subgraphs/nodes.

8 Outline Problem definition “Min-cut” plots ( +experiments) A-plots ( +experiments) Conclusions

9 “Min-cut” plot What is a min-cut? Minimizes the number of edges cut Two partitions of almost equal size Size of mincut = 2

10 “Min-cut” plot Do min-cuts recursively. log (# edges) log (mincut-size / #edges) N nodes Mincut size = sqrt(N)

11 “Min-cut” plot Do min-cuts recursively. log (# edges) log (mincut-size / #edges) N nodes New min-cut

12 “Min-cut” plot Do min-cuts recursively. log (# edges) log (mincut-size / #edges) N nodes New min-cut Slope = -0.5 For a d-dimensional grid, the slope is -1/d

13 “Min-cut” plot log (# edges) log (mincut-size / #edges) Slope = -1/d For a d-dimensional grid, the slope is -1/d log (# edges) log (mincut-size / #edges) For a random graph, the slope is 0

14 “Min-cut” plot Min-cut sizes have important effects on graph properties, such as  efficiency of divide-and-conquer algorithms  compact graph representation  difference of the graph from well-known graph types for example, slope = 0 for a random graph

15 “Min-cut” plot What does it look like for a real-world graph? log (# edges) log (mincut-size / #edges) ?

16 Experiments Datasets:  Google Web Graph: 916,428 nodes and 5,105,039 edges  Lucent Router Graph: Undirected graph of network routers from 112,969 nodes and 181,639 edges  User  Website Clickstream Graph: 222,704 nodes and 952,580 edges

17 Experiments Used the METIS algorithm [ Karypis, Kumar, 1995] log (# edges) log (mincut-size / #edges) Google Web graph Values along the y- axis are averaged We observe a “lip” for large edges Slope of -0.4, corresponds to a 2.5- dimensional grid! Slope~ -0.4

18 Experiments Same results for other graphs too… log (# edges) log (mincut-size / #edges) Lucent Router graphClickstream graph Slope~ Slope~ -0.45

19 Observations Linear slope for some range of values “Lip” for high #edges Far from random graphs (because slope ≠ 0)

20 Outline Problem definition “Min-cut” plots ( +experiments) A-plots ( +experiments) Conclusions

21 A-plots How can we find abnormal nodes or subgraphs?  Visualization but most graph visualization techniques do not scale to large graphs!

22 A-plots However, humans are pretty good at “eyeballing” data Our idea:  Sort the adjacency matrix in novel ways  and plot the matrix  so that patterns become visible to the user We will demonstrate this on the SCAN+Lucent Router graph (284,805 nodes and 898,492 edges)

23 A-plots Three types of such plots for undirected graphs…  RV-RV (RankValue vs RankValue)  Sort nodes based on their “network value” (~first eigenvector) Rank of Network Value

24 A-plots Three types of such plots for undirected graphs…  RD-RD (RankDegree vs RankDegree)  Sort nodes based on their degree Rank of Degree of node

25 A-plots Three types of such plots for undirected graphs…  D-RV (Degree vs RankValue)  Sort nodes according to “network value”, and show their corresponding degree

26 RV-RV plot (RankValue vs RankValue) We can see a “teardrop” shape and also some blank “stripes” and a strong diagonal  (even though there are no self-loops)! Rank of Network Value Stripes

27 RV-RV plot (RankValue vs RankValue) Rank of Network Value The “teardrop” structure can be explained by degree-1 and degree-2 nodes NV 1 = 1/λ * NV 2 12

28 RV-RV plot (RankValue vs RankValue) Rank of Network Value Strong diagonal  nodes are more likely to connect to “similar” nodes

29 RD-RD (RankDegree vs RankDegree) Rank of Degree of node Isolated dots  due to 2-node isolated components

30 D-RV (Degree vs RankValue) Rank of Network Value Degree

31 D-RV (Degree vs RankValue) Rank of Network ValueDegree Why?

32 Explanation of “Spikes” and “Stripes” RV-RV plot had stripes; D-RV plot shows spikes. Why? “Stripe” nodes  degree-2 nodes connecting only to the “Spike” nodes “Spike” nodes  high degree, but all edges to “Stripe” nodes Stripe

33 A-plots They helped us detect a buried abnormal subgraph in a large real-world dataset which can then be taken to the domain experts.

34 Outline Problem definition “Min-cut” plots ( +experiments) A-plots ( +experiments) Conclusions

35 Conclusions We presented  “Min-cut” plot A novel graph pattern with relevance for many algorithms and applications  A-plots which help us find interesting abnormalities  All the methods are scalable  Their usage was demonstrated on large real-world graph datasets

36 RV-RV plot (RankValue vs RankValue) We can see a “teardrop” shape and also some blank “stripes” and a strong diagonal. Rank of Network Value

37 RD-RD (RankDegree vs RankDegree) Rank of Degree of node Isolated dots  due to 2-node isolated components