Presentation is loading. Please wait.

Presentation is loading. Please wait.

SCS CMU Speaker Hanghang Tong Colibri: Fast Mining of Large Static and Dynamic Graphs 2009-3-31Speaking Skill Requirement.

Similar presentations


Presentation on theme: "SCS CMU Speaker Hanghang Tong Colibri: Fast Mining of Large Static and Dynamic Graphs 2009-3-31Speaking Skill Requirement."— Presentation transcript:

1 SCS CMU Speaker Hanghang Tong Colibri: Fast Mining of Large Static and Dynamic Graphs 2009-3-31Speaking Skill Requirement

2 SCS CMU Joint work with 2009-3-3Speaking Skill Requirement2 Spiros Papadimitriou Jimeng Sun Christos Faloutsos Philip S. Yu

3 SCS CMU Graphs are everywhere! Q: How to find patterns? e.g., communities, anomalies, etc. 3

4 SCS CMU Motivation Q: How to find patterns? –e.g., communities, anomalies, etc. A: Low-Rank Approximation (LRA) for Adjacency Matrix of the Graph. A L MR XX ~ ~ 4

5 SCS CMU LRA for Graph Mining: Communities John KDD Tom Bob Carl Van Roy RECOMB ISMB ICDM AuthorConf. LMR ~ ~ XX Adj. matrix: A Au. clusters Conf. Cluster Interaction 2009-3-3Speaking Skill Requirement5

6 SCS CMU LRA for Graph Mining: Anomalies John KDD Tom Bob Carl Van Roy RECOMB ISMB ICDM AuthorConf. LMR ~ ~ XX Adj. matrix: A Au. clusters Conf. Cluster Interaction Recon. error is high  ‘Carl’ is abnormal 2009-3-3Speaking Skill Requirement6

7 SCS CMU Challenges Prob.1: Given a static graph A, + (C1) How to get (L, M, R) efficiently? - Both time and space + (C2) How to get (L, M, R) Intuitively? - Easy for interpretation Prob. 2: Given a dynamic graph A t (t=1,2,…), + (C3) How to get (L t, M t, R t ) dynamically? - Track patterns over time 2009-3-3Speaking Skill Requirement7

8 SCS CMU Roadmap Motivation Existing Methods –SVD –CUR/CX Proposed Methods: Colibri Experimental Results Conclusion 2009-3-3Speaking Skill Requirement8

9 SCS CMU Matrix & Vector MatrixB = 3 1 1 0 b 1 b 2 b 1, b 2 are vectors in 3-d space! 2009-3-3Speaking Skill Requirement9 SIGMOD ICML SIGMOD Philip Yu John SmithWilliam Cohen

10 SCS CMU Column Space Matrix Column Space of a Matrix B = 3 1 1 0 b 1 b 2 b 1, b 2 are vectors in 3-d space! 2009-3-3Speaking Skill Requirement10 SIGMOD ICML SIGMOD VLDB = SIGMOD – ICML = [2 0 0]’

11 SCS CMU Projection, Projection Matrix & Core Matrix v v ~ v ~ = B v BTBT BTBBTB + XXX Projection of v Projection matrix of B An arbitrary vector Core Matrix 2009-3-311 ICML SIGMOD KDD ~

12 SCS CMU Projection, Projection Matrix & Core Matrix v v ~ v ~ = B v BTBT BTBBTB + XXX Projection of v Projection matrix of B An arbitrary vector Core Matrix 2009-3-312 ICML SIGMOD KDD ~

13 SCS CMU Roadmap Motivation Existing Methods –SVD –CUR/CX Proposed Methods: Colibri Experimental Results Conclusion 2009-3-3Speaking Skill Requirement13

14 SCS CMU Singular-Value-Decomposition (SVD) …. a1a1 a2a2 a3a3 amam … A: n x m …. u1u1 ukuk … U : left singular vectors …. … v1v1 V : right singular vectors vkvk xx … … … …………… … … ~ ~ 14

15 SCS CMU SVD: Characteristic #1: Find the left matrix U, where #2: Project A into the column space of U Projection Matrix of Column Space of U 2009-3-3Speaking Skill Requirement15

16 SCS CMU SVD: advantages Optimal Low-Rank Approximation –In both L 2 and L F For any rank-k matrix A k || A – || 2, F <= || A – A k || 2,F 2009-3-3Speaking Skill Requirement16

17 SCS CMU SVD: drawbacks Efficiency –Time –Space (U, V) are dense Interpretation 1 st singular vector 2 nd singular vector 2009-3-3Speaking Skill Requirement17

18 SCS CMU SVD: drawbacks Dynamic: not easy 2009-3-3Speaking Skill Requirement18 1 st singular vector 2 nd singular vector 1 st singular vector 2 nd singular vector X Y ? t+1 t

19 SCS CMU Roadmap Motivation Existing Methods –SVD –CUR/CX Proposed Methods: Colibri Experimental Results Conclusion 2009-3-3Speaking Skill Requirement19

20 SCS CMU CUR (CX) decomposition [Drineas+ 2005] …. … A: n x m …. C R xx … … … … … … … … U ~ ~ Sample Columns from A to form C Project A onto the col. Space of C 2009-3-3Speaking Skill Requirement 20

21 SCS CMU CUR (CX): advantages Quality: Near-Optimal Efficiency (better than SVD) –Time (c is # of sampled col.s) –Space (C, R) are sparse Interpretation 2009-3-321

22 SCS CMU Redundancy in C, wasting both time and space CUR (CX): drawbacks 3 copies of green, 2 copies of red, 2 copies of purple purple=0.5*green + red… 2009-3-3Speaking Skill Requirement22

23 SCS CMU Redundant Col. Does Not Help 2009-3-3Speaking Skill Requirement23 KDD ICML SIGMOD ~ KDD VLDB KDD ICML SIGMOD ~ KDD Observations: VLDB #1: Does not help KDD #2: wastes Time & Space ~

24 SCS CMU Dynamic: not easy CUR (CX): drawbacks 2009-3-3Speaking Skill Requirement24 tt+1 ? ~ ~~ C ~ C

25 SCS CMU Roadmap Motivation Existing Methods Proposed Method: Colibri –Colibri-S for static graphs (Prob. 1) –Colibri-D for dynamic graphs (Prob. 2) Experimental Results Conclusion 2009-3-3Speaking Skill Requirement25

26 SCS CMU 3 copies of green, 2 copies of red, 2 copies of purple purple=0.5*green + red… Colibri-S: Basic Idea L ….…. ….…. …. RM x x CUR (CX) Colibri-S Original Matrix We want the Col.s in L to be linearly independent! 2009-3-326Speaking Skill Requirement

27 SCS CMU A Pictorial Comparison 27 1 st singular vector 2 nd singular vector SVD CUR [Drineas+ 2005] Colibri-S [Tong+ 2008] # of copies X: SVM Y: Optimization Dark dot: selected

28 SCS CMU M= = Core Matrix Initially Sampled matrix C …. L = : Linearly Ind. Col.s ….…. ….…. ….…. ….…. R = L T x A = …. InputOutput ? LTLT L Q: How to find L & M from C efficiently ? 28

29 SCS CMU discard v A: Find L & M iteratively! …. Current L & M Redundant ? … For each col. v in C Project it on L Initially Sampled Matrix C Expand L & M 2009-3-3Speaking Skill Requirement29 Easy!

30 SCS CMU Update Core Matrix 2009-3-3 SIGMOD M old ICML SIGMOD ICML = X SIGMOD M new ICML SIGMOD ICML = X KDD KDD ICML SIGMOD ~ KDD ?

31 SCS CMU Update Core Matrix 2009-3-3 __ M new M old KDD ~ ~ X + __ 1 X 2 KDD ~ - 1 X __ - 1 KDD ~ X 1 = Theorem [Tong et al 2008] We only need to know KDD and ! ~

32 SCS CMU Colibri-S vs. CUR(CX) Quality: Colibri-S = CUR(CX) Time: Colibri-S bettor or equal CUR(CX) Space Colibri-S bettor or equal CUR(CX) Iterpretations Colibri-S = CUR(CX) 2009-3-3Speaking Skill Requirement32

33 SCS CMU A Pictorial Comparison 2009-3-3Speaking Skill Requirement33 X: SVM Y: Optimization Each dot is a document

34 SCS CMU A Pictorial Comparison: SVD 2009-3-3Speaking Skill Requirement34 X: SVM Y: Optimization 1 st singular vector2 nd singular vector Each dot is a document

35 SCS CMU A Pictorial Comparison: CUR [Drineas+ 2005] 2009-3-3Speaking Skill Requirement35 Each dot is a document X: SVM Y: Optimization 2 2 2 4 3 1

36 SCS CMU A Pictorial Comparison: Colibri-S [Tong+ 2008] 2009-3-3Speaking Skill Requirement36 Each dot is a document X: SVM Y: Optimization

37 SCS CMU A Pictorial Comparison 37 1 st singular vector 2 nd singular vector SVD CUR [Drineas+ 2005] CMD [Sun+ 2007] Colibri-S [Tong+ 2008]

38 SCS CMU Roadmap Motivation Existing Methods Proposed Method: Colibri –Colibri-S for static graphs (Prob. 1) –Colibri-D for dynamic graphs (Prob. 2) Experimental Results Conclusion 2009-3-3Speaking Skill Requirement38

39 SCS CMU Problem Definitions Given (e.g., Author-Conference Graphs) Find 2009-3-3Speaking Skill Requirement39 A1A1 A2A2 A3A3 L1L1 M1M1 R1R1 L2L2 M2M2 R2R2 L3L3 M3M3 R3R3 … …

40 SCS CMU Colibri-D for dynamic graphs Initially sampled matrix t+1 LtLt MtMt RtRt L t+1 M t+1 R t+1 ? Q: How to update L and M efficiently? t 40

41 SCS CMU Colibri-D: How-To Initially sampled matrix t+1 LtLt MtMt RtRt L t+1 M t+1 R t+1 t Selected Redundant ? Changed from t 2009-3-3 41

42 SCS CMU Colibri-D: How-To Initially sampled matrix t+1 LtLt MtMt L t+1 M t+1 t Selected Redundant L ~ M ~ Subspace by blue cols at t+1 Unchanged Cols! 2009-3-3Speaking Skill Requirement42 Step 1 Step 2

43 SCS CMU t LtLt t+1 ~ LtLt v Get Core Matrix for Un-changed Col.s X MtMt 2009-3-3Speaking Skill Requirement43 = [(L t )’ x L t ] -1 = = [(L t )’ x L t ] -1 = ~~ X ? MtMt ~

44 SCS CMU Get Core Matrix for Un-changed Col.s Let 2009-3-344 _ MtMt ~ X X = Theorem [Tong et al 2008] We only need a matrix inverse the same size as changed columns in L t ! Speaking Skill Requirement

45 SCS CMU Comparison SVD, CUR vs. Colibri s Wish List SVD [ Golub+ 1989] CUR/CX [Drineas+ 2005] Colibri [Tong+ 2008] Efficiency Interpretation Dynamics ?

46 SCS CMU Roadmap Motivation Existing Methods Proposed Method: Colibri Experimental Results Conclusion 2009-3-3Speaking Skill Requirement46

47 SCS CMU Experimental Setup Datasets Network traffic 21,837 sources/destinations 1,222 consecutive hours (~ 2 months) 22,800 edges per hour Accuracy: Accuracy = Space Cost: 2009-3-3Speaking Skill Requirement47

48 SCS CMU Performance of Colibri-S TimeSpace Ours CUR CMD 48 SVD Ours Accuracy Same 91%+ Time 12x of CMD 28x of CUR Space ~1/3 of CMD ~10% of CUR

49 SCS CMU Performance of Colibri-D Time # of changed cols CMD Colibri-S Colibri-D achieves up to 112x speedups Colibri-D 49 Network traffic - 21,837 nodes - 1,220 hours - 22,800 edge/hr (Prior Best Method)

50 SCS CMU Conclusion Colibri-S –For static graphs –Remove redundancy –Up to 52x speedup; 2/3 space saving –No quality Loss Colibri-D –For dynamic graphs –Leverage “smoothness” –Up to 112x than best known methods 2009-3-3Speaking Skill Requirement50

51 SCS CMU Q&A Thank you! www.cs.cmu.edu/~htong 2009-3-3Speaking Skill Requirement51

52 SCS CMU How many columns do we need? # of cols/rows: polynomial on k, log(1/epsilon), and 1/delta w/ 1-epsilon, || A – CUR || <= || A – A k || + delta || A || 2009-3-3Speaking Skill Requirement52


Download ppt "SCS CMU Speaker Hanghang Tong Colibri: Fast Mining of Large Static and Dynamic Graphs 2009-3-31Speaking Skill Requirement."

Similar presentations


Ads by Google