Presentation is loading. Please wait.

Presentation is loading. Please wait.

SCS CMU Proximity Tracking on Time- Evolving Bipartite Graphs Speaker: Hanghang Tong Joint Work with Spiros Papadimitriou, Philip S. Yu, Christos Faloutsos.

Similar presentations


Presentation on theme: "SCS CMU Proximity Tracking on Time- Evolving Bipartite Graphs Speaker: Hanghang Tong Joint Work with Spiros Papadimitriou, Philip S. Yu, Christos Faloutsos."— Presentation transcript:

1 SCS CMU Proximity Tracking on Time- Evolving Bipartite Graphs Speaker: Hanghang Tong Joint Work with Spiros Papadimitriou, Philip S. Yu, Christos Faloutsos Apr. 24-26, 2008, Atlanta SIAM Conference on Data Mining

2 SCS CMU 2 Graphs are everywhere!

3 SCS CMU 3 Graph Mining: the big picture Graph/Global Level Subgraph/ Community Level Node Level We are here!

4 SCS CMU 4 Proximity on Graph: What? a.k.a Relevance, Closeness, ‘Similarity’…

5 SCS CMU 5 Link Prediction Prox. Hist. for a set of deleted links density Prox (i  j)+Prox (j  i) Prox. is effective to ‘deleted’ and absent edges! Q: How to predict the existence of the link? A: Proximity! [Liben-Nowell + 2003] Prox. Hist. for a set of absent links

6 SCS CMU 6 Neighborhood Search on graphs … … … … ConferenceAuthor A: Proximity! [Sun+ ICDM2005] Q: what is most related conference to ICDM?

7 SCS CMU 7 Example

8 SCS CMU 8 Test Image SeaSunSkyWaveCatForestTigerGrass Image Keyword Region Automatic Image Caption Q: How to assign keywords to the test image? A: Proximity! [Pan+ 2004]

9 SCS CMU 9 Center-Piece Subgraph(CePS) Original Graph CePS Q: How to find hub for the black nodes? A: Proximity! [Tong+ KDD 2006] CePS guy Input Output

10 SCS CMU 10 Output Input Data Graph Query Graph Matching Subgraph Q: How to find matching subgraph? A: Proximity! [Tong+ KDD 2007] Best-Effort Pattern Match

11 SCS CMU Challenge Graphs are evolving over time! –New nodes/edges show up; –Existing nodes/edges die out; –Edge weights change… 11 Q: How to Generalize everything? A: Track Proximity!

12 SCS CMU Trend analysis on graph level 12 M. Jordan G.Hinton C. Koch T. Sejnowski Year Rank of Influential-ness

13 SCS CMU 13 Roadmap Motivation Prox. On Static Graphs Prox. On Time-Evolving Graphs Experimental Results Conclusion

14 SCS CMU 14 Random walk with restart Node 4 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Node 8 Node 9 Node 10 Node 11 Node 12 0.13 0.10 0.13 0.22 0.13 0.05 0.08 0.04 0.03 0.04 0.02 1 4 3 2 5 6 7 9 10 8 1 1212 0.13 0.10 0.13 0.05 0.08 0.04 0.02 0.04 0.03 Ranking vector More red, more relevant Nearby nodes, higher scores Query

15 SCS CMU 15 Computing RWR 1 4 3 2 5 6 7 9 10 8 1 1212 n x n n x 1 Ranking vector Starting vector Adjacency matrix 1 Restart p Query

16 SCS CMU 16 Q: Given query i, how to solve it? ? ? Adjacency matrix Starting vector Ranking vector Query

17 SCS CMU RWR on Bipartite Graph 17 n m authors Conferences Author-Conf. Matrix Observation: n >> m! Examples: 1. DBLP: 400k aus, 3.5k confs 2. NetFlix: 2.7M usrs,18k mvs

18 SCS CMU 18 Q: Given query i, how to solve it? RWR on Skewed bipartite graphs ? ? ….... ….. …... 0 0 n m ArAr ….... ….. …... AcAc m confs n aus

19 SCS CMU Step 1: Step 2: Cost: Examples –NetFlix: 1.5hr for pre-computation; –DBLP: 1 few minutes 19 BB_Lin: Pre-Computation [Tong+ 06] M = AcAc ArAr X 2-step RWR for Conferences All Conf-Conf Prox. Scores m conferences n authors

20 SCS CMU 20 BB_Lin: Pre-Computation [Tong+ 06] Step 1: Step 2: M = AcAc ArAr X 2-step RWR for Conferences All Conf-Conf Prox. Scores m conferences n authors

21 SCS CMU 21 BB_Lin: Pre-Computation [Tong+ 06] Step 1: Step 2: Cost: Examples –NetFlix: 1.5hr for pre-computation; –DBLP: 1 few minutes M = AcAc ArAr X 2-step RWR for Conferences All Conf-Conf Prox. Scores Ac/Ar E edges m x m

22 SCS CMU BB_Lin: On-Line Stage 22 Ac/Ar E edges (Base) Case 1: - Conf - Conf authors Conferences Read out !

23 SCS CMU BB_Lin: On-Line Stage 23 Ac/Ar E edges Case 2: - Au - Conf authors Conferences 1 matrix-vec!

24 SCS CMU BB_Lin: On-Line Stage 24 Ac/Ar E edges Case 3: - Au - Au authors Conferences 2 matrix-vec!

25 SCS CMU BB_Lin: Examples NetFlix dataset (2.7m user x 18k movies) –1.5hr for pre-computation; –<1 sec for on-line DBLP dataset (400k authors x 3.5k confs) –A few minutes for pre-computation –<0.01 sec for on-line 25

26 SCS CMU 26 Roadmap Motivation Prox. On Static Graphs Prox. On Time-Evolving Graphs Experimental Results Conclusion

27 SCS CMU 27 Challenges BB_Lin is good for skewed bipartite graphs –for NetFlix (2.7M nodes and 100M edges) –On-line cost for query: fraction of seconds w/ 1.5 hr pre-computation for m x m core matrix But…what if the graph is evolving over time –New edges/nodes arrive; edge weights increase… –On-line cost: 1.5hr itself becomes a part this!

28 SCS CMU 28 t=0 Q: How to update the core matrix? t=1 ~ ~ ?

29 SCS CMU Update the core matrix Step 1: Step 2: 29 M = Ac Ar X ~ ~ ~ M = X + Rank 2 update = + X

30 SCS CMU Update : General Case E’ edges changed Involves n’ authors, m’ confs. Observation 30 M = AcAc ArAr X ~ n authors m Conferences

31 SCS CMU 31 Observation: –the rank of update is small! –Real Example (DBLP Post) 1258 time steps E’ up to ~20,000! min(n’,m’) <=132 Our Algorithm Update : General Case 31 n authors m Conferences

32 SCS CMU 32 Roadmap Motivation Prox. On Static Graphs Prox. On Time-Evolving Graphs Experimental Results Conclusion

33 SCS CMU 33 pTrack [Given] –(1) a large, skewed time-evolving bipartite graphs, –(2) the query nodes of interest [Track] –(1) top-k most related nodes for each query node at each time step t; –(2) the proximity score (or rank of proximity) between any two query nodes at each time step t

34 SCS CMU 34 Philip S. Yu’s Top-5 conferences up to each year ICDE ICDCS SIGMETRICS PDIS VLDB CIKM ICDCS ICDE SIGMETRICS ICMCS KDD SIGMOD ICDM CIKM ICDCS ICDM KDD ICDE SDM VLDB 1992199720022007 Databases Performance Distributed Sys. Databases Data Mining DBLP: (Au. x Conf.) - 400k aus, - 3.5k confs - 20 yrs

35 SCS CMU 35 KDD’s Rank wrt. VLDB over years Prox. Rank Year Data Mining and Databases are more and more relavant!

36 SCS CMU 36 cTrack [Given] –(1) a large, skewed time-evolving graphs, –(2) the query nodes of interest [Track] –(1) top-k most central nodes at each time step t; –(2) the centrality score (or rank of centrality) for each query node at each time step t

37 SCS CMU 37 10 most influential authors in NIPS community up to each year Author-paper bipartite graph from NIPS 1987-1999. 3k. 1740 papers, 2037 authors, spreading over 13 years T. Sejnowski M. Jordan

38 SCS CMU 38 Fast-Single-Update 176x speedup 40x speedup log(Time) (Seconds) Datasets Our method

39 SCS CMU 39 Fast-Batch-Update Min (n’, m’)E’ Time (Seconds) 15x speed-up on average! Our method

40 SCS CMU Conclusion Trends Analysis on Graph Level –pTrack/cTrack Scalable for evolving graphs 40 Trends g r a p h

41 SCS CMU 41 Thank you! www.cs.cmu.edu/~htong


Download ppt "SCS CMU Proximity Tracking on Time- Evolving Bipartite Graphs Speaker: Hanghang Tong Joint Work with Spiros Papadimitriou, Philip S. Yu, Christos Faloutsos."

Similar presentations


Ads by Google