Download presentation

Presentation is loading. Please wait.

1
CMU SCS C. Faloutsos (CMU)#1 Large Graph Algorithms Christos Faloutsos CMU McGlohon, Mary Prakash, Aditya Tong, Hanghang Tsourakakis, Babis Akoglu, Leman Chau, Polo Kang, U OpenCirrus'10

2
CMU SCS ICDM-LDMTA 2009C. Faloutsos 2 Graphs - why should we care? Internet Map [lumeta.com] Food Web [Martinez ’91] Protein Interactions [genomebiology.com] Friendship Network [Moody ’01]

3
CMU SCS ICDM-LDMTA 2009C. Faloutsos 3 Graphs - why should we care? IR: bi-partite graphs (doc-terms) web: hyper-text graph Social networking sites (Facebook, twitter) Users posing and answering questions Click-streams (user – page bipartite graph)... and more – any M:N db relationship D1D1 DNDN T1T1 TMTM...

4
CMU SCS C. Faloutsos (CMU) 4 Our goal: One-stop solution for mining huge graphs: PEGASUS project (PEta GrAph mining System) www.cs.cmu.edu/~pegasus Open-source code and papers OpenCirrus'10

5
CMU SCS C. Faloutsos (CMU) 5 CentralizedHadoop/PEG ASUS Degree Distr. old Pagerank old Diameter/ANF old DONE Conn. Comp old DONE TrianglesDONE VisualizationSTARTED Outline – Algorithms & results OpenCirrus'10

6
CMU SCS HADI for diameter estimation Radius Plots for Mining Tera-byte Scale Graphs U Kang, Charalampos Tsourakakis, Ana Paula Appel, Christos Faloutsos, Jure Leskovec, SDM’10 Naively: diameter needs O(N**2) space and up to O(N**3) time – prohibitive (N~1B) Our HADI: linear on E (~10B) –Near-linear scalability wrt # machines –Several optimizations -> 5x faster C. Faloutsos (CMU) 6 OpenCirrus'10

7
CMU SCS YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) Largest publicly available graph ever studied. ???? ?? 19+? [Barabasi+] 7 C. Faloutsos (CMU)OpenCirrus'10 Radius Count

8
CMU SCS YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) effective diameter: surprisingly small. Multi-modality: probably mixture of cores. 8 C. Faloutsos (CMU)OpenCirrus'10

9
CMU SCS YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) effective diameter: surprisingly small. Multi-modality: probably mixture of cores. 9 C. Faloutsos (CMU)OpenCirrus'10

10
CMU SCS Radius Plot of GCC of YahooWeb. 10 C. Faloutsos (CMU)OpenCirrus'10

11
CMU SCS Running time - Kronecker and Erdos-Renyi Graphs with billions edges. #11C. Faloutsos (CMU)OpenCirrus'10

12
CMU SCS C. Faloutsos (CMU) 12 CentralizedHadoop/PEG ASUS Degree Distr. old Pagerank old Diameter/ANF old DONE Conn. Comp old DONE TrianglesDONE VisualizationSTARTED Outline – Algorithms & results OpenCirrus'10

13
CMU SCS Generalized Iterated Matrix Vector Multiplication (GIMV) OpenCirrus'10C. Faloutsos (CMU) 13 PEGASUS: A Peta-Scale Graph Mining System - Implementation and ObservationsSystem - Implementation and Observations. U Kang, Charalampos E. Tsourakakis, and Christos Faloutsos. (ICDM) 2009, Miami, Florida, USA. Best Application Paper (runner-up).ICDM

14
CMU SCS Generalized Iterated Matrix Vector Multiplication (GIMV) OpenCirrus'10C. Faloutsos (CMU) 14 PageRank proximity (RWR) Diameter Connected components (eigenvectors, Belief Prop. … ) Matrix – vector Multiplication (iterated)

15
CMU SCS 15 Example: GIM-V At Work Connected Components Size Count C. Faloutsos (CMU)OpenCirrus'10

16
CMU SCS 16 Example: GIM-V At Work Connected Components Size Count 300-size cmpt X 500. Why? 1100-size cmpt X 65. Why? C. Faloutsos (CMU)OpenCirrus'10

17
CMU SCS 17 Example: GIM-V At Work Connected Components Size Count suspicious financial-advice sites (not existing now) C. Faloutsos (CMU)OpenCirrus'10

18
CMU SCS C. Faloutsos (CMU) 18 CentralizedHadoop/PEG ASUS Degree Distr. old Pagerank old Diameter/ANF old DONE Conn. Comp old DONE TrianglesDONE VisualizationSTARTED Outline – Algorithms & results OpenCirrus'10

19
CMU SCS ASONAM 2009C. Faloutsos 19 Triangles Real social networks have a lot of triangles

20
CMU SCS ASONAM 2009C. Faloutsos 20 Triangles Real social networks have a lot of triangles –Friends of friends are friends Q1: how to compute quickly? Q2: Any patterns?

21
CMU SCS ASONAM 2009C. Faloutsos 21 Triangles : Computations [Tsourakakis ICDM 2008] Q: Can we do that quickly? Triangles are expensive to compute (3-way join; several approx. algos)

22
CMU SCS ASONAM 2009C. Faloutsos 22 Triangles : Computations [Tsourakakis ICDM 2008] But: triangles are expensive to compute (3-way join; several approx. algos) Q: Can we do that quickly? A: Yes! #triangles = 1/6 Sum ( i 3 ) (and, because of skewness, we only need the top few eigenvalues!

23
CMU SCS ASONAM 2009C. Faloutsos 23 Triangles : Computations [Tsourakakis ICDM 2008] 1000x+ speed-up, high accuracy

24
CMU SCS C. Faloutsos (CMU) 24 Triangles Easy to implement on hadoop: it only needs eigenvalues (working on it, using Lanczos) OpenCirrus'10

25
CMU SCS ASONAM 2009C. Faloutsos 25 Triangles Real social networks have a lot of triangles –Friends of friends are friends Q1: how to compute quickly? Q2: Any patterns?

26
CMU SCS ASONAM 2009C. Faloutsos 26 Triangle Law: #1 [Tsourakakis ICDM 2008] ASNHEP-TH Epinions X-axis: # of Triangles a node participates in Y-axis: count of such nodes

27
CMU SCS ASONAM 2009C. Faloutsos 27 Triangle Law: #2 [Tsourakakis ICDM 2008] SNReuters Epinions X-axis: degree Y-axis: mean # triangles Notice: slope ~ degree exponent (insets)

28
CMU SCS C. Faloutsos (CMU) 28 CentralizedHadoop/PEG ASUS Degree Distr. old Pagerank old Diameter/ANF old DONE Conn. Comp old DONE TrianglesDONE VisualizationSTARTED Outline – Algorithms & results OpenCirrus'10

29
CMU SCS Visualization: ShiftR Supporting Ad Hoc Sensemaking: Integrating Cognitive, HCI, and Data Mining Approaches Aniket Kittur, Duen Horng (‘Polo’) Chau, Christos Faloutsos, Jason I. Hong Sensemaking Workshop at CHI 2009, April 4-5. Boston, MA, USA. OpenCirrus'10C. Faloutsos (CMU) 29

30
CMU SCS

31
C. Faloutsos (CMU) 31 Conclusions One-stop shopping for large graph mining: www.cs.cmu.edu/~pegasus OpenCirrus'10 Akoglu, Leman Chau, Polo Kang, U McGlohon, Mary Tsourakakis, Babis THANKS: NSF, Yahoo (M45), LLNL

Similar presentations

© 2019 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google