Presentation is loading. Please wait.

Presentation is loading. Please wait.

DOULION: Counting Triangles in Massive Graphs with a Coin

Similar presentations


Presentation on theme: "DOULION: Counting Triangles in Massive Graphs with a Coin"— Presentation transcript:

1 DOULION: Counting Triangles in Massive Graphs with a Coin
Charalampos (Babis) Tsourakakis Carnegie Mellon University KDD ‘09 Paris Joint work with: U Kang, Gary L. Miller, Christos Faloutsos DOULION, KDD 09

2 Outline Motivation Related Work Proposed Method Results Conclusion
Extra DOULION, KDD 09

3 Why is Triangle Counting important?
Clustering coefficient Transitivity ratio Social Network Analysis fact: “Friends of friends are friends” A C B [WF94)] Hidden Thematic Structure of the Web (Eckmann et al. PNAS [EM02]) Motif Detection, (e.g., [YPSB05] ) Web Spam Detection (Becchetti et.al. KDD ’08 [BBCG08]) DOULION, KDD 09

4 Personal Motivation [CET08] eigenvalues of adjacency matrix
Political Blogs eigenvalues of adjacency matrix Keep only 3! 3 i-th eigenvector DOULION, KDD 09

5 Outline Motivation Related Work Proposed Method Results Conclusion
Extra DOULION, KDD 09

6 Counting methods Dense graphs Sparse graphs Fast Low space
Time complexity O(n2.37) O(n3) Space complexity O(n2) O(m) Sparse graphs Fast Low space Time complexity O(m0.7n1.2+n2+o(1)) e.g. O( n ) Space complexity Θ(n2) (eventually) Θ(m) Matrix Multiplication not practical M. Latapy, Theory and Experiments DOULION, KDD 09

7 Naive Sampling X=1 T3 X=0 T0 T1 T2
r independent samples of three distinct vertices X=1 T3 X=0 T0 T1 T2 DOULION, KDD 09

8 Naive Sampling r independent samples of three distinct vertices Then the following holds: with probability at least 1-δ Works Prohibitive for graphs with T3=o(n2). e.g., T3 n2logn DOULION, KDD 09

9 Buriol, Frahling, Leonardi, Marchetti-Spaccamela, Sohler
k Sample uniformly at random an edge (i,j) and a node k in V-{i,j} ? ? i j Check if edges (i,k) and (j,k) exist in E(G) samples DOULION, KDD 09

10 Outline Motivation Related Work Proposed Method Results Conclusion
Extra DOULION, KDD 09

11 Our Sampling Approach HEADS! (i,j) “survives” G(V,E) 1/p i j
DOULION, KDD 09

12 Our Sampling Approach G(V,E) k m TAILS! (k,m) “dies” DOULION, KDD 09

13 Sampling approach DOULION, KDD 09

14 Our Sampling Approach on Kn
Gn,0.5 In Expectation Initially Weighted * DOULION, KDD 09

15 E[Χ]=Δ Mean and Variance Δ=#triangles=k+(Δ-k)
k non-edge-disjoint triangles X r.v, our estimate E[Χ]=Δ DOULION, KDD 09

16 Outline Motivation Related Work Proposed Method Results Conclusion
Extra DOULION, KDD 09

17 Doulion and NodeIterator
Sparsify first and then use Node Iterator to count triangles. Node Iterator: Consider each node and count how many edges among its neighbors DOULION, KDD 09

18 Expected Speedup Expected Speedup: 1/p2 Proof
Let R be the running time of Node Iterator after the sparsification: Therefore, expected speedup: DOULION, KDD 09

19 Some results (I) ~3M, ~35M ~400K, ~2.1M DOULION, KDD 09

20 Some results (II) ~3.1M, ~37M ~3.6M, ~42M DOULION, KDD 09

21 Outline Motivation Related Work Proposed Method Results Conclusion
Extra DOULION, KDD 09

22 Conclusions New Sampling approach that counts triangles approximately.
Basic analysis of the estimate (expectation, variance, expected speedup) Experimentation on many real world datasets where we showed that for p=constant we get high quality estimates and 1/p2 constant speedups. DOULION, KDD 09

23 Question Can p be smaller than constant? How small can we afford p to be and at the same time guarantee concentration? Could e.g., p be as small as 1/ ??? Motivation: p Speedup 0.001 106 0.005 4*104 0.01 104 DOULION, KDD 09

24 Outline Motivation Related Work Proposed Method Results Conclusion
Extra DOULION, KDD 09

25 Approximate Triangle Counting
Approximate Triangle Counting Arxiv preprint C.E.T M.N. Kolountzakis G.L. Miller DOULION, KDD 09

26 Theorem C.E.T, Kolountzakis, Miller 2009
How to choose p? Mildness, pick p=1 Concentration DOULION, KDD 09

27 Practitioner’s Guide Wikipedia 2005 1,6M nodes 18,5M edges
Pick p=1/ Keep doubling until concentration Concentration appears Concentration becomes stronger DOULION, KDD 09

28 “Bad” Instances Remove edge (1,2)
Remove any weighted edge w sufficiently large DOULION, KDD 09

29 Thanks! http://www.cs.cmu.edu/~ctsourak/projects.html
Code and datasets available (HADOOP, MATLAB, JAVA implementations along with small real-world graphs, all datasets used are on the web) An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software environment and the complete set of instructions which generated the figures. Buckheit and Donoho[BD95] DOULION, KDD 09

30 References Efficient semi-streaming algorithms for local triangle counting in massive graphs Becchetti, Boldi, Castillio, Gionis [BBCG08] Commensurate distances and similar motifs in genetic congruence and protein interaction networks in yeast Ye, Peyser, Spencer, Bader [YPSB05] DOULION, KDD 09

31 References Curvature of co-links uncovers hidden thematic layers in the World Wide Web Eckmann, Moses [EM02] DOULION, KDD 09

32 References Fast Counting of Triangles in Large Real-World Networks: Algorithms and Laws C. Tsourakakis [BD95] Wavelab and reproducible research Buckheit, Donoho DOULION, KDD 09

33 References Social Network Analysis: Methods and Applications
Wasserman, Faust [WF94] Counting triangles in data streams Buriol, Frahling, Leonardi, Spaccamela, Sohler [BFLSS06] DOULION, KDD 09

34 Doulion DOULION, KDD 09


Download ppt "DOULION: Counting Triangles in Massive Graphs with a Coin"

Similar presentations


Ads by Google