SCS CMU Proximity Tracking on Time- Evolving Bipartite Graphs Speaker: Hanghang Tong Joint Work with Spiros Papadimitriou, Philip S. Yu, Christos Faloutsos.

Slides:



Advertisements
Similar presentations
BiG-Align: Fast Bipartite Graph Alignment
Advertisements

CMU SCS : Multimedia Databases and Data Mining Lecture #21: Tensor decompositions C. Faloutsos.
Fast Algorithms for Querying and Mining Large Graphs Hanghang Tong Machine Learning Department Carnegie Mellon University
Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
1 Social Influence Analysis in Large-scale Networks Jie Tang 1, Jimeng Sun 2, Chi Wang 1, and Zi Yang 1 1 Dept. of Computer Science and Technology Tsinghua.
CMU SCS C. Faloutsos (CMU)#1 Large Graph Algorithms Christos Faloutsos CMU McGlohon, Mary Prakash, Aditya Tong, Hanghang Tsourakakis, Babis Akoglu, Leman.
Fast Algorithms for Querying and Mining Large Graphs Hanghang Tong Machine Learning Department Carnegie Mellon University
N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.
Fast Direction-Aware Proximity for Graph Mining KDD 2007, San Jose Hanghang Tong, Yehuda Koren, Christos Faloutsos.
SCS CMU Joint Work by Hanghang Tong, Spiros Papadimitriou, Jimeng Sun, Philip S. Yu, Christos Faloutsos Speaker: Hanghang Tong Aug , 2008, Las Vegas.
Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun.
© 2011 IBM Corporation IBM Research SIAM-DM 2011, Mesa AZ, USA, Non-Negative Residual Matrix Factorization w/ Application to Graph Anomaly Detection Hanghang.
© 2010 IBM Corporation Diversified Ranking on Large Graphs: An Optimization Viewpoint Hanghang Tong, Jingrui He, Zhen Wen, Ching-Yung Lin, Ravi Konuru.
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1:
C-DEM: A Multi-Modal Query System for Drosophila Embryo Databases Fan Guo, Lei Li, Eric Xing, Christos Faloutsos Carnegie Mellon University {fanguo, leili,
Measure Proximity on Graphs with Side Information Joint Work by Hanghang Tong, Huiming Qu, Hani Jamjoom Speaker: Mary McGlohon 1 ICDM 2008, Pisa, Italy15-19.
Fast Random Walk with Restart and Its Applications
SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct , 2008, Napa, CA CIKM 2008.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 3: Recommendations & proximity Faloutsos,
School of Electronics Engineering and Computer Science Peking University Beijing, P.R. China Ziqi Wang, Yuwei Tan, Ming Zhang.
GDG DevFest Central Italy Joint work with J. Feldman, S. Lattanzi, V. Mirrokni (Google Research), S. Leonardi (Sapienza U. Rome), H. Lynch (Google)
School of Computer Science Carnegie Mellon LLNL, Feb. '07C. Faloutsos1 Mining static and time-evolving graphs Christos Faloutsos Carnegie Mellon University.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Random Walk with Restart (RWR) for Image Segmentation
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 TANGENT: A Novel, “Surprise-me”, Recommendation Algorithm.
2015/10/111 DBconnect: Mining Research Community on DBLP Data Osmar R. Zaïane, Jiyang Chen, Randy Goebel Web Mining and Social Network Analysis Workshop.
1 Discovering Authorities in Question Answer Communities by Using Link Analysis Pawel Jurczyk, Eugene Agichtein (CIKM 2007)
ValuePick : Towards a Value-Oriented Dual-Goal Recommender System Leman Akoglu Christos Faloutsos OEDM in conjunction with ICDM 2010 Sydney, Australia.
KDD 2007, San Jose Fast Direction-Aware Proximity for Graph Mining Speaker: Hanghang Tong Joint work w/ Yehuda Koren, Christos Faloutsos.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong Guest Lecture.
Fast Random Walk with Restart and Its Applications Hanghang Tong, Christos Faloutsos and Jia-Yu (Tim) Pan ICDM 2006 Dec , HongKong.
Xiaowei Ying, Xintao Wu Dept. Software and Information Systems Univ. of N.C. – Charlotte 2008 SIAM Conference on Data Mining, April 25 th Atlanta, Georgia.
A Local Seed Selection Algorithm for Overlapping Community Detection 1 A Local Seed Selection Algorithm for Overlapping Community Detection Farnaz Moradi,
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 5: Graphs over time & tensors Faloutsos,
1 LinkClus: Efficient Clustering via Heterogeneous Semantic Links Xiaoxin Yin, Jiawei Han Univ. of Illinois at Urbana-Champaign Philip S. Yu IBM T.J. Watson.
Tools and Algorithms for Querying and Mining Large Graphs Hanghang Tong Machine Learning Department Carnegie Mellon University
1 Panther: Fast Top-K Similarity Search on Large Networks Jing Zhang 1, Jie Tang 1, Cong Ma 1, Hanghang Tong 2, Yu Jing 1, and Juanzi Li 1 1 Department.
Guided Learning for Role Discovery (GLRD) Presented by Rui Liu Gilpin, Sean, Tina Eliassi-Rad, and Ian Davidson. "Guided learning for role discovery (glrd):
Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad
KDD 2007, San Jose Fast Direction-Aware Proximity for Graph Mining Speaker: Hanghang Tong Joint work w/ Yehuda Koren, Christos Faloutsos.
Talk 2: Graph Mining Tools - SVD, ranking, proximity Christos Faloutsos CMU.
Page 1 PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Panther: Fast Top-k Similarity Search in Large Networks JING ZHANG, JIE TANG, CONG MA, HANGHANG TONG, YU JING, AND JUANZI LI Presented by Moumita Chanda.
Kijung Shin Jinhong Jung Lee Sael U Kang
Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P9-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
CMU SCS Panel: Social Networks Christos Faloutsos CMU.
CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P8-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 8: hadoop and Tera/Peta byte graphs.
SCS CMU Speaker Hanghang Tong Colibri: Fast Mining of Large Static and Dynamic Graphs Speaking Skill Requirement.
Paper Presentation Social influence based clustering of heterogeneous information networks Qiwei Bao & Siqi Huang.
Presented by: Siddhant Kulkarni Spring Authors: Publication:  ICDE 2015 Type:  Research Paper 2.
Outlier Detection for Information Networks Manish Gupta 15 th Jan 2013.
Term Project Proposal By J. H. Wang Apr. 7, 2017.
Finding Dense and Connected Subgraphs in Dual Networks
Large Graph Mining: Power Tools and a Practitioner’s guide
Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad
Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad
Integrating Meta-Path Selection With User-Guided Object Clustering in Heterogeneous Information Networks Yizhou Sun†, Brandon Norick†, Jiawei Han†, Xifeng.
Large Graph Mining: Power Tools and a Practitioner’s guide
Speaker: Hanghang Tong Carnegie Mellon University
Jinhong Jung, Woojung Jin, Lee Sael, U Kang, ICDM ‘16
Example: Academic Search
Proximity in Graphs by Using Random Walks
PRSim: Sublinear Time SimRank Computation on Large Power-Law Graphs.
Presentation transcript:

SCS CMU Proximity Tracking on Time- Evolving Bipartite Graphs Speaker: Hanghang Tong Joint Work with Spiros Papadimitriou, Philip S. Yu, Christos Faloutsos Apr , 2008, Atlanta SIAM Conference on Data Mining

SCS CMU 2 Graphs are everywhere!

SCS CMU 3 Graph Mining: the big picture Graph/Global Level Subgraph/ Community Level Node Level We are here!

SCS CMU 4 Proximity on Graph: What? a.k.a Relevance, Closeness, ‘Similarity’…

SCS CMU 5 Link Prediction Prox. Hist. for a set of deleted links density Prox (i  j)+Prox (j  i) Prox. is effective to ‘deleted’ and absent edges! Q: How to predict the existence of the link? A: Proximity! [Liben-Nowell ] Prox. Hist. for a set of absent links

SCS CMU 6 Neighborhood Search on graphs … … … … ConferenceAuthor A: Proximity! [Sun+ ICDM2005] Q: what is most related conference to ICDM?

SCS CMU 7 Example

SCS CMU 8 Test Image SeaSunSkyWaveCatForestTigerGrass Image Keyword Region Automatic Image Caption Q: How to assign keywords to the test image? A: Proximity! [Pan+ 2004]

SCS CMU 9 Center-Piece Subgraph(CePS) Original Graph CePS Q: How to find hub for the black nodes? A: Proximity! [Tong+ KDD 2006] CePS guy Input Output

SCS CMU 10 Output Input Data Graph Query Graph Matching Subgraph Q: How to find matching subgraph? A: Proximity! [Tong+ KDD 2007] Best-Effort Pattern Match

SCS CMU Challenge Graphs are evolving over time! –New nodes/edges show up; –Existing nodes/edges die out; –Edge weights change… 11 Q: How to Generalize everything? A: Track Proximity!

SCS CMU Trend analysis on graph level 12 M. Jordan G.Hinton C. Koch T. Sejnowski Year Rank of Influential-ness

SCS CMU 13 Roadmap Motivation Prox. On Static Graphs Prox. On Time-Evolving Graphs Experimental Results Conclusion

SCS CMU 14 Random walk with restart Node 4 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Node 8 Node 9 Node 10 Node 11 Node Ranking vector More red, more relevant Nearby nodes, higher scores Query

SCS CMU 15 Computing RWR n x n n x 1 Ranking vector Starting vector Adjacency matrix 1 Restart p Query

SCS CMU 16 Q: Given query i, how to solve it? ? ? Adjacency matrix Starting vector Ranking vector Query

SCS CMU RWR on Bipartite Graph 17 n m authors Conferences Author-Conf. Matrix Observation: n >> m! Examples: 1. DBLP: 400k aus, 3.5k confs 2. NetFlix: 2.7M usrs,18k mvs

SCS CMU 18 Q: Given query i, how to solve it? RWR on Skewed bipartite graphs ? ? ….... ….. … n m ArAr ….... ….. …... AcAc m confs n aus

SCS CMU Step 1: Step 2: Cost: Examples –NetFlix: 1.5hr for pre-computation; –DBLP: 1 few minutes 19 BB_Lin: Pre-Computation [Tong+ 06] M = AcAc ArAr X 2-step RWR for Conferences All Conf-Conf Prox. Scores m conferences n authors

SCS CMU 20 BB_Lin: Pre-Computation [Tong+ 06] Step 1: Step 2: M = AcAc ArAr X 2-step RWR for Conferences All Conf-Conf Prox. Scores m conferences n authors

SCS CMU 21 BB_Lin: Pre-Computation [Tong+ 06] Step 1: Step 2: Cost: Examples –NetFlix: 1.5hr for pre-computation; –DBLP: 1 few minutes M = AcAc ArAr X 2-step RWR for Conferences All Conf-Conf Prox. Scores Ac/Ar E edges m x m

SCS CMU BB_Lin: On-Line Stage 22 Ac/Ar E edges (Base) Case 1: - Conf - Conf authors Conferences Read out !

SCS CMU BB_Lin: On-Line Stage 23 Ac/Ar E edges Case 2: - Au - Conf authors Conferences 1 matrix-vec!

SCS CMU BB_Lin: On-Line Stage 24 Ac/Ar E edges Case 3: - Au - Au authors Conferences 2 matrix-vec!

SCS CMU BB_Lin: Examples NetFlix dataset (2.7m user x 18k movies) –1.5hr for pre-computation; –<1 sec for on-line DBLP dataset (400k authors x 3.5k confs) –A few minutes for pre-computation –<0.01 sec for on-line 25

SCS CMU 26 Roadmap Motivation Prox. On Static Graphs Prox. On Time-Evolving Graphs Experimental Results Conclusion

SCS CMU 27 Challenges BB_Lin is good for skewed bipartite graphs –for NetFlix (2.7M nodes and 100M edges) –On-line cost for query: fraction of seconds w/ 1.5 hr pre-computation for m x m core matrix But…what if the graph is evolving over time –New edges/nodes arrive; edge weights increase… –On-line cost: 1.5hr itself becomes a part this!

SCS CMU 28 t=0 Q: How to update the core matrix? t=1 ~ ~ ?

SCS CMU Update the core matrix Step 1: Step 2: 29 M = Ac Ar X ~ ~ ~ M = X + Rank 2 update = + X

SCS CMU Update : General Case E’ edges changed Involves n’ authors, m’ confs. Observation 30 M = AcAc ArAr X ~ n authors m Conferences

SCS CMU 31 Observation: –the rank of update is small! –Real Example (DBLP Post) 1258 time steps E’ up to ~20,000! min(n’,m’) <=132 Our Algorithm Update : General Case 31 n authors m Conferences

SCS CMU 32 Roadmap Motivation Prox. On Static Graphs Prox. On Time-Evolving Graphs Experimental Results Conclusion

SCS CMU 33 pTrack [Given] –(1) a large, skewed time-evolving bipartite graphs, –(2) the query nodes of interest [Track] –(1) top-k most related nodes for each query node at each time step t; –(2) the proximity score (or rank of proximity) between any two query nodes at each time step t

SCS CMU 34 Philip S. Yu’s Top-5 conferences up to each year ICDE ICDCS SIGMETRICS PDIS VLDB CIKM ICDCS ICDE SIGMETRICS ICMCS KDD SIGMOD ICDM CIKM ICDCS ICDM KDD ICDE SDM VLDB Databases Performance Distributed Sys. Databases Data Mining DBLP: (Au. x Conf.) - 400k aus, - 3.5k confs - 20 yrs

SCS CMU 35 KDD’s Rank wrt. VLDB over years Prox. Rank Year Data Mining and Databases are more and more relavant!

SCS CMU 36 cTrack [Given] –(1) a large, skewed time-evolving graphs, –(2) the query nodes of interest [Track] –(1) top-k most central nodes at each time step t; –(2) the centrality score (or rank of centrality) for each query node at each time step t

SCS CMU most influential authors in NIPS community up to each year Author-paper bipartite graph from NIPS k papers, 2037 authors, spreading over 13 years T. Sejnowski M. Jordan

SCS CMU 38 Fast-Single-Update 176x speedup 40x speedup log(Time) (Seconds) Datasets Our method

SCS CMU 39 Fast-Batch-Update Min (n’, m’)E’ Time (Seconds) 15x speed-up on average! Our method

SCS CMU Conclusion Trends Analysis on Graph Level –pTrack/cTrack Scalable for evolving graphs 40 Trends g r a p h

SCS CMU 41 Thank you!