# CHARALAMPOS E. TSOURAKAKIS SCHOOL OF COMPUTER SCIENCE CARNEGIE MELLON UNIVERSITY Fast counting of triangles in large networks without counting: Algorithms.

## Presentation on theme: "CHARALAMPOS E. TSOURAKAKIS SCHOOL OF COMPUTER SCIENCE CARNEGIE MELLON UNIVERSITY Fast counting of triangles in large networks without counting: Algorithms."— Presentation transcript:

CHARALAMPOS E. TSOURAKAKIS SCHOOL OF COMPUTER SCIENCE CARNEGIE MELLON UNIVERSITY Fast counting of triangles in large networks without counting: Algorithms and laws 1 ICDM, Dec. '08

C. E. Tsourakakis Triangle related problems Given an undirected, simple graph G(V,E) a triangle is a set of three vertices such that any two of them are connected by an edge of the graph. Related problems  Decide if a graph is triangle-free.  Count the total number of triangles Δ(G).  Count the number of triangles Δ(v) that vertex v participates in.  List the triangles that each vertex v participates in. 2 ICDM, Dec. '08 Generality Our focus

C. E. Tsourakakis Why is Triangle Counting important? From the Graph Mining Perspective ICDM, Dec. '08 3 Clustering coefficient Transitivity ratio Social Network Analysis fact: “Friends of friends are friends” [WF94] Other applications include: Hidden Thematic Structure of the Web [EM02] Motif Detection e.g. biological networks [YPSB05] Web Spam Detection [BPCG08] A C B

C. E. Tsourakakis Outline ICDM, Dec. '08 4 Related Work Proposed Method  Theorems  Algorithms  Explaining efficiency Experiments Triangle-related Laws Triangles in Kronecker Graphs Conclusions

C. E. Tsourakakis Related Work ICDM, Dec. '08 5 FastLow space Time complexityO(n 2.37 )O(n 3 ) Space complexityO(n2)O(n2)O(m)=O(n 2 ) FastLow space Time complexity O(m 0.7 n 1.2 +n 2+o(1) ) e.g. O( n ) Space complexityO(n 2 ) (eventually) O(m) Dense graphs S p a r s e g r a p h s

C. E. Tsourakakis Outline ICDM, Dec. '08 6 Related Work Proposed Method  Theorems  Algorithms  Explaining efficiency Experiments Triangle-related Laws Triangles in Kronecker Graphs Conclusions

C. E. Tsourakakis Theorem [EigenTriangle] ICDM, Dec. '08 7 Theorem 1 Δ(G) = # triangles in graph G(V,E) = eigenvalues of adjacency matrix A G

C. E. Tsourakakis Theorem [EigenTriangleLocal] ICDM, Dec. '08 8 Theorem 2 Δ(i) = #Δ s vertex i participates at. = i-th eigenvector = j-th entry of i Δ(i) = 2

C. E. Tsourakakis Outline ICDM, Dec. '08 9 Related Work Proposed Method  Theorems  Algorithms  Explaining efficiency Experiments Triangle-related Laws Triangles in Kronecker Graphs Conclusions

C. E. Tsourakakis EigenTriangle Algorithm (interactively) ICDM, Dec. '08 10 I want to compute the number of triangles! Use Lanczos to compute the first two eigenvalues please! Is the cube of the second one significantly smaller than the cube of the first? NO Iterate then! After some iterations… (hopefully few!) Compute the k-th eigenvalue. Is much smaller than ? YES! Algorithm terminates! The estimated # of Δs is the sum of cubes of λ i’ s divided by 6!

C. E. Tsourakakis EigenTriangle Algorithm ICDM, Dec. '08 11

C. E. Tsourakakis EigenTriangleLocal Algorithm ICDM, Dec. '08 12 Why are these two algorithms efficient on power law networks?

C. E. Tsourakakis Typical Spectra of Power Law Networks ICDM, Dec. '08 13 AirportsPolitical blogs

C. E. Tsourakakis 1 st Reason : Top Eigenvalues of Power-Law Graphs ICDM, Dec. '08 14 Very important for us because:  Few eigenvalues contribute a lot!  Cubes amplify this even more!  Lanczos converges fast due to large spectral gaps [GL89]!

C. E. Tsourakakis 1 st Reason : Top Eigenvalues of Power-Law Graphs ICDM, Dec. '08 15 One of the first to observe that the top eigenvalues follow a power-law were Faloutsos, Faloutsos and Faloutsos [FFF99]. Some years later Mihail & Papadimitriou [MP02] and Chung, Lu and Vu [CLV03] gave an explanation of this fact.

C. E. Tsourakakis 2 nd Reason : Bulk of eigenvalues ICDM, Dec. '08 16 Almost symmetric around 0! Sum of cubes almost cancels out! Political Blogs Omit! Keep only 3! 3

C. E. Tsourakakis Outline ICDM, Dec. '08 17 Related Work Proposed Method  Theorems  Algorithms  Explaining efficiency Experiments Triangle-related Laws Triangles in Kronecker Graphs Conclusions

C. E. Tsourakakis Datasets ICDM, Dec. '08 18 NodesEdgesDescription ~75K~405KEpinions network ~404K~2.1MFlickr ~27K~341KArxiv Hep-Th ~1K~17KPolitical blogs ~13K~148KReuters news ~3M35MWikipedia 2006-Sep-05 ~3.15M~37MWikipedia 2006-Nov-04 ~13.5K~37.5KAS Oregon ~23.5K~47.5KCAIDA AS 2004 to 2008 (means over 151 timestamps)

C. E. Tsourakakis Datasets ICDM, Dec. '08 19 NodesEdgesDescription ~75K~405KEpinions network ~404K~2.1MFlickr ~27K~341KArxiv Hep-Th ~1K~17KPolitical blogs ~13K~148KReuters news ~3M35MWikipedia 2006-Sep-05 ~3.15M~37MWikipedia 2006-Nov-04 ~13.5K~37.5KAS Oregon ~23.5K~47.5KCAIDA AS 2004 to 2008 (means over 151 timestamps) Social Networks

C. E. Tsourakakis Datasets ICDM, Dec. '08 20 NodesEdgesDescription ~75K~405KEpinions network ~404K~2.1MFlickr ~27K~341KArxiv Hep-Th ~1K~17KPolitical blogs ~13K~148KReuters news ~3M35MWikipedia 2006-Sep-05 ~3.15M~37MWikipedia 2006-Nov-04 ~13.5K~37.5KAS Oregon ~23.5K~47.5KCAIDA AS 2004 to 2008 (means over 151 timestamps) Social Networks Co-authorship network

C. E. Tsourakakis Datasets ICDM, Dec. '08 21 NodesEdgesDescription ~75K~405KEpinions network ~404K~2.1MFlickr ~27K~341KArxiv Hep-Th ~1K~17KPolitical blogs ~13K~148KReuters news ~3M35MWikipedia 2006-Sep-05 ~3.15M~37MWikipedia 2006-Nov-04 ~13.5K~37.5KAS Oregon ~23.5K~47.5KCAIDA AS 2004 to 2008 (means over 151 timestamps) Social Networks Co-authorship network Information Networks

C. E. Tsourakakis Datasets ICDM, Dec. '08 22 NodesEdgesDescription ~75K~405KEpinions network ~404K~2.1MFlickr ~27K~341KArxiv Hep-Th ~1K~17KPolitical blogs ~13K~148KReuters news ~3M35MWikipedia 2006-Sep-05 ~3.15M~37MWikipedia 2006-Nov-04 ~13.5K~37.5KAS Oregon ~23.5K~47.5KCAIDA AS 2004 to 2008 (means over 151 timestamps) Social Networks Co-authorship network Information Networks Web Graphs

C. E. Tsourakakis Datasets ICDM, Dec. '08 23 NodesEdgesDescription ~75K~405KEpinions network ~404K~2.1MFlickr ~27K~341KArxiv Hep-Th ~1K~17KPolitical blogs ~13K~148KReuters news ~3M35MWikipedia 2006-Sep-05 ~3.15M~37MWikipedia 2006-Nov-04 ~13.5K~37.5KAS Oregon ~23.5K~47.5KCAIDA AS 2004 to 2008 (means over 151 timestamps) Social Networks Co-authorship network Information Networks Web Graphs Internet Graphs

C. E. Tsourakakis Datasets ICDM, Dec. '08 24 ~3.15M nodes ~37M edges NodesEdgesDescription ~75K~405KEpinions network ~404K~2.1MFlickr ~27K~341KArxiv Hep-Th ~1K~17KPolitical blogs ~13K~148KReuters news ~3M35MWikipedia 2006-Sep-05 ~3.15M~37MWikipedia 2006-Nov-04 ~13.5K~37.5KAS Oregon ~23.5K~47.5KCAIDA AS 2004 to 2008 (means over 151 timestamps)

C. E. Tsourakakis Competitor: Node Iterator 25 Node Iterator algorithm For each node, look at its neighbors, then check how many edges among them. Complexity: O( ) We report the results as the speedup vs. Node Iterator. ICDM, Dec. '08

C. E. Tsourakakis Results: #Eigenvalues vs. Speedup 26 ICDM, Dec. '08

C. E. Tsourakakis Results: #Edges vs. Speedup 27 ICDM, Dec. '08 Observe the trend

C. E. Tsourakakis Some interesting observations 28 6.2 typical rank for at least 95% Speedups are between 33.7x and 1159x. The mean speedup is 250. Notice the increasing speedup as the size of the network grows. ICDM, Dec. '08

C. E. Tsourakakis Evaluating the Local Counting Method ICDM, Dec. '08 29 Triangles node i participates according to our estimation

C. E. Tsourakakis #Eigenvalues vs. ρ for three networks 30 ICDM, Dec. '08 2-3 eigenvalues almost ideal results!

C. E. Tsourakakis Outline ICDM, Dec. '08 31 Related Work Proposed Method  Theorems  Algorithms  Explaining efficiency Experiments Triangle-related Laws Triangles in Kronecker Graphs Conclusions

C. E. Tsourakakis Triangle Participation Power Law (TPPL) ICDM, Dec. '08 32 EPINIONS δ = #Triangles Count of nodes participating in δ triangles

C. E. Tsourakakis Triangle Participation Power Law (TPPL) ICDM, Dec. '08 33 HEP_TH (coauthorship) Flickr

C. E. Tsourakakis Degree Triangle Power Law (DTPL) ICDM, Dec. '08 34 EPINIONS d, all degrees appearing in the graph Mean #Δs over all nodes with degree d

C. E. Tsourakakis Degree Triangle Power Law (DTPL) ICDM, Dec. '08 35 Flickr Reuters

C. E. Tsourakakis Observations on TPPL & DTPL ICDM, Dec. '08 36 TTPL: Many nodes few triangles Few nodes many triangles

C. E. Tsourakakis Observations on TPPL & DTPL ICDM, Dec. '08 37 DTPL:  Power law fits nicely to the Degree-Triangle plot.  Slope is the opposite of the slope of the degree distribution (slope complementarity).

C. E. Tsourakakis Outline ICDM, Dec. '08 38 Related Work Proposed Method  Theorems  Algorithms  Explaining efficiency Experiments Triangle-related Laws Triangles in Kronecker Graphs Conclusions

C. E. Tsourakakis Kronecker graphs ICDM, Dec. '08 39 Kronecker graphs is a model for generating graphs that mimic properties of real-world networks. The basic operation is the Kronecker product([LCKF05]). 011 101 110 Initiator graph Adjacency matrix A [0] Kronecker Product Adjacency matrix A [1] Adjacency matrix A [2] Repeat k times Adjacency matrix A [k]

C. E. Tsourakakis Triangles in Kronecker Graphs ICDM, Dec. '08 40 Theorem[KroneckerTRC ] Let B = A [k] k-th Kronecker product and Δ(G A ), Δ(G Β ) the total number of triangles in G A, G Β. Then, the following equality holds:

C. E. Tsourakakis Outline ICDM, Dec. '08 41 Related Work Proposed Method  Theorems  Algorithms  Explaining efficiency Experiments Triangle-related Laws Triangles in Kronecker Graphs Conclusions

C. E. Tsourakakis Conclusions ICDM, Dec. '08 42 Triangles can be approximated with high accuracy in power law networks by taking a few, constant number of eigenvalues. The method is easily parallelizable (matrix-vector multiplications only) and converges fast due to large spectral gaps. New triangle-related power laws Closed formula for triangles in Kronecker graphs.

C. E. Tsourakakis Future Work ICDM, Dec. '08 43 Import in HADOOP PEGASUS (Peta-Graph Mining)  On-going work with U Kang and Christos Faloutsos in collaboration with Yahoo! Research.

C. E. Tsourakakis Christos Faloutsos Ioannis Koutis ICDM, Dec. '08 44 Acknowledgements For the helpful discussions

C. E. Tsourakakis Maria Tsiarli ICDM, Dec. '08 45 Acknowledgements For the PEGASUS logo

C. E. Tsourakakis 46 ICDM, Dec. '08

C. E. Tsourakakis References ICDM, Dec. '08 47 [WF94] Wasserman, Faust: “Social Network Analysis: Methods and Applications (Structural Analysis in the Social Sciences)” [EM02] Eckmann, Moses: “Curvature of co-links uncovers hidden thematic layers in the World Wide Web” [YPSB05] Ye, Peyser, Spencer, Bader: “Commensurate distances and similar motifs in genetic congruence and protein interaction networks in yeast”

C. E. Tsourakakis References ICDM, Dec. '08 48 [BPCG08] Becchetti, Boldi, Castillo, Gionis Efficient Semi-Streaming Algorithms for Local Triangle Counting in Massive Graphs [LCKF05] Leskovec, Chakrabarti, Kleinberg, Faloutsos: “Realistic, Mathematically Tractable Graph Generation and Evolution using Kronecker Multiplication” [FFF09] Faloutsos, Faloutsos, Faloutsos: “On power-law relationships of the Internet topology”

C. E. Tsourakakis References ICDM, Dec. '08 49  [MP02] Mihail, Papadimitriou: “On the Eigenvalue Power Law”  [CLV03] Chung, Lu, Vu: “Spectra of Random Graphs with given expected degrees”  [GL89] Golub, Van Loan: “Matrix Computations”

C. E. Tsourakakis References ICDM, Dec. '08 50 For more references, paper and slides: http://www.cs.cmu.edu/~ctsourak

C. E. Tsourakakis Questions? ICDM, Dec. '08 51

Download ppt "CHARALAMPOS E. TSOURAKAKIS SCHOOL OF COMPUTER SCIENCE CARNEGIE MELLON UNIVERSITY Fast counting of triangles in large networks without counting: Algorithms."

Similar presentations