Download presentation

Presentation is loading. Please wait.

1
NetMine: Mining Tools for Large Graphs Deepayan Chakrabarti Yiping Zhan Daniel Blandford Christos Faloutsos Guy Blelloch

2
2 Introduction Internet Map [lumeta.com] Food Web [Martinez ’91] Protein Interactions [genomebiology.com] Friendship Network [Moody ’01] ► Graphs are ubiquitous

3
3 Graph “Patterns” How does a real-world graph look like? Patterns/“Laws” Degree distributions/power-laws Count vs Outdegree Power Laws

4
4 Graph “Patterns” How does a real-world graph look like? Patterns/“Laws” Degree distributions/power-laws “Small-world” “Scree” plots and others… Hop-plot Effective Diameter

5
5 Graph “Patterns” Why do we like them? They capture interesting properties of graphs. They provide “condensed information” about the graph. They are needed to build/test realistic graph generators (useful for simulation studies extrapolations/sampling ) They help detect abnormalities and outliers.

6
6 Graph Patterns Degree-distributions “power laws” “Small-world” small diameter “Scree” plots … What else?

7
7 Our Work The NetMine toolkit contains all the patterns mentioned before, and adds: The “min-cut” plot a novel pattern which carries interesting information about the graph. A-plots a tool to quickly find suspicious subgraphs/nodes.

8
8 Outline Problem definition “Min-cut” plots ( +experiments) A-plots ( +experiments) Conclusions

9
9 “Min-cut” plot What is a min-cut? Minimizes the number of edges cut Two partitions of almost equal size Size of mincut = 2

10
10 “Min-cut” plot Do min-cuts recursively. log (# edges) log (mincut-size / #edges) N nodes Mincut size = sqrt(N)

11
11 “Min-cut” plot Do min-cuts recursively. log (# edges) log (mincut-size / #edges) N nodes New min-cut

12
12 “Min-cut” plot Do min-cuts recursively. log (# edges) log (mincut-size / #edges) N nodes New min-cut Slope = -0.5 For a d-dimensional grid, the slope is -1/d

13
13 “Min-cut” plot log (# edges) log (mincut-size / #edges) Slope = -1/d For a d-dimensional grid, the slope is -1/d log (# edges) log (mincut-size / #edges) For a random graph, the slope is 0

14
14 “Min-cut” plot Min-cut sizes have important effects on graph properties, such as efficiency of divide-and-conquer algorithms compact graph representation difference of the graph from well-known graph types for example, slope = 0 for a random graph

15
15 “Min-cut” plot What does it look like for a real-world graph? log (# edges) log (mincut-size / #edges) ?

16
16 Experiments Datasets: Google Web Graph: 916,428 nodes and 5,105,039 edges Lucent Router Graph: Undirected graph of network routers from www.isi.edu/scan/mercator/maps.html; 112,969 nodes and 181,639 edges www.isi.edu/scan/mercator/maps.html User Website Clickstream Graph: 222,704 nodes and 952,580 edges

17
17 Experiments Used the METIS algorithm [ Karypis, Kumar, 1995] log (# edges) log (mincut-size / #edges) Google Web graph Values along the y- axis are averaged We observe a “lip” for large edges Slope of -0.4, corresponds to a 2.5- dimensional grid! Slope~ -0.4

18
18 Experiments Same results for other graphs too… log (# edges) log (mincut-size / #edges) Lucent Router graphClickstream graph Slope~ -0.57 Slope~ -0.45

19
19 Observations Linear slope for some range of values “Lip” for high #edges Far from random graphs (because slope ≠ 0)

20
20 Outline Problem definition “Min-cut” plots ( +experiments) A-plots ( +experiments) Conclusions

21
21 A-plots How can we find abnormal nodes or subgraphs? Visualization but most graph visualization techniques do not scale to large graphs!

22
22 A-plots However, humans are pretty good at “eyeballing” data Our idea: Sort the adjacency matrix in novel ways and plot the matrix so that patterns become visible to the user We will demonstrate this on the SCAN+Lucent Router graph (284,805 nodes and 898,492 edges)

23
23 A-plots Three types of such plots for undirected graphs… RV-RV (RankValue vs RankValue) Sort nodes based on their “network value” (~first eigenvector) Rank of Network Value

24
24 A-plots Three types of such plots for undirected graphs… RD-RD (RankDegree vs RankDegree) Sort nodes based on their degree Rank of Degree of node

25
25 A-plots Three types of such plots for undirected graphs… D-RV (Degree vs RankValue) Sort nodes according to “network value”, and show their corresponding degree

26
26 RV-RV plot (RankValue vs RankValue) We can see a “teardrop” shape and also some blank “stripes” and a strong diagonal (even though there are no self-loops)! Rank of Network Value Stripes

27
27 RV-RV plot (RankValue vs RankValue) Rank of Network Value The “teardrop” structure can be explained by degree-1 and degree-2 nodes NV 1 = 1/λ * NV 2 12

28
28 RV-RV plot (RankValue vs RankValue) Rank of Network Value Strong diagonal nodes are more likely to connect to “similar” nodes

29
29 RD-RD (RankDegree vs RankDegree) Rank of Degree of node Isolated dots due to 2-node isolated components

30
30 D-RV (Degree vs RankValue) Rank of Network Value Degree

31
31 D-RV (Degree vs RankValue) Rank of Network ValueDegree Why?

32
32 Explanation of “Spikes” and “Stripes” RV-RV plot had stripes; D-RV plot shows spikes. Why? “Stripe” nodes degree-2 nodes connecting only to the “Spike” nodes “Spike” nodes high degree, but all edges to “Stripe” nodes Stripe

33
33 A-plots They helped us detect a buried abnormal subgraph in a large real-world dataset which can then be taken to the domain experts.

34
34 Outline Problem definition “Min-cut” plots ( +experiments) A-plots ( +experiments) Conclusions

35
35 Conclusions We presented “Min-cut” plot A novel graph pattern with relevance for many algorithms and applications A-plots which help us find interesting abnormalities All the methods are scalable Their usage was demonstrated on large real-world graph datasets

36
36 RV-RV plot (RankValue vs RankValue) We can see a “teardrop” shape and also some blank “stripes” and a strong diagonal. Rank of Network Value

37
37 RD-RD (RankDegree vs RankDegree) Rank of Degree of node Isolated dots due to 2-node isolated components

Similar presentations

© 2019 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google