Fast Random Walk with Restart and Its Applications

Slides:



Advertisements
Similar presentations
CMU SCS PageRank Brin, Page description: C. Faloutsos, CMU.
Advertisements

BiG-Align: Fast Bipartite Graph Alignment
School of Computer Science Carnegie Mellon University Duke University DeltaCon: A Principled Massive- Graph Similarity Function Danai Koutra Joshua T.
School of Computer Science Carnegie Mellon University National Taiwan University of Science & Technology Unifying Guilt-by-Association Approaches: Theorems.
Fast Algorithms for Querying and Mining Large Graphs Hanghang Tong Machine Learning Department Carnegie Mellon University
© 2012 IBM Corporation IBM Research Gelling, and Melting, Large Graphs by Edge Manipulation Joint Work by Hanghang Tong (IBM) B. Aditya Prakash (Virginia.
More on Rankings. Query-independent LAR Have an a-priori ordering of the web pages Q: Set of pages that contain the keywords in the query q Present the.
SASH Spatial Approximation Sample Hierarchy
Fast Query Execution for Retrieval Models based on Path Constrained Random Walks Ni Lao, William W. Cohen Carnegie Mellon University
N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.
Fast Direction-Aware Proximity for Graph Mining KDD 2007, San Jose Hanghang Tong, Yehuda Koren, Christos Faloutsos.
Graph Based Semi- Supervised Learning Fei Wang Department of Statistical Science Cornell University.
SCS CMU Joint Work by Hanghang Tong, Spiros Papadimitriou, Jimeng Sun, Philip S. Yu, Christos Faloutsos Speaker: Hanghang Tong Aug , 2008, Las Vegas.
Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun.
© 2011 IBM Corporation IBM Research SIAM-DM 2011, Mesa AZ, USA, Non-Negative Residual Matrix Factorization w/ Application to Graph Anomaly Detection Hanghang.
© 2010 IBM Corporation Diversified Ranking on Large Graphs: An Optimization Viewpoint Hanghang Tong, Jingrui He, Zhen Wen, Ching-Yung Lin, Ravi Konuru.
Iterative Set Expansion of Named Entities using the Web Richard C. Wang and William W. Cohen Language Technologies Institute Carnegie Mellon University.
SCS CMU Proximity Tracking on Time- Evolving Bipartite Graphs Speaker: Hanghang Tong Joint Work with Spiros Papadimitriou, Philip S. Yu, Christos Faloutsos.
Scaling Personalized Web Search Glen Jeh, Jennfier Widom Stanford University Presented by Li-Tal Mashiach Search Engine Technology course (236620) Technion.
C-DEM: A Multi-Modal Query System for Drosophila Embryo Databases Fan Guo, Lei Li, Eric Xing, Christos Faloutsos Carnegie Mellon University {fanguo, leili,
1 Fast Dynamic Reranking in Large Graphs Purnamrita Sarkar Andrew Moore.
Measure Proximity on Graphs with Side Information Joint Work by Hanghang Tong, Huiming Qu, Hani Jamjoom Speaker: Mary McGlohon 1 ICDM 2008, Pisa, Italy15-19.
1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman
SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct , 2008, Napa, CA CIKM 2008.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 3: Recommendations & proximity Faloutsos,
Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.
School of Electronics Engineering and Computer Science Peking University Beijing, P.R. China Ziqi Wang, Yuwei Tan, Ming Zhang.
GDG DevFest Central Italy Joint work with J. Feldman, S. Lattanzi, V. Mirrokni (Google Research), S. Leonardi (Sapienza U. Rome), H. Lynch (Google)
School of Computer Science Carnegie Mellon LLNL, Feb. '07C. Faloutsos1 Mining static and time-evolving graphs Christos Faloutsos Carnegie Mellon University.
School of Computer Science Carnegie Mellon University National Taiwan University of Science & Technology Unifying Guilt-by-Association Approaches: Theorems.
Bayesian Sets Zoubin Ghahramani and Kathertine A. Heller NIPS 2005 Presented by Qi An Mar. 17 th, 2006.
Random Walk with Restart (RWR) for Image Segmentation
2015/10/111 DBconnect: Mining Research Community on DBLP Data Osmar R. Zaïane, Jiyang Chen, Randy Goebel Web Mining and Social Network Analysis Workshop.
DATA MINING LECTURE 13 Pagerank, Absorbing Random Walks Coverage Problems.
EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Cuoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
ValuePick : Towards a Value-Oriented Dual-Goal Recommender System Leman Akoglu Christos Faloutsos OEDM in conjunction with ICDM 2010 Sydney, Australia.
Computing & Information Sciences Kansas State University Laboratory for Knowledge Discovery in Databases PhD Research Proficiency Exam Jing.
KDD 2007, San Jose Fast Direction-Aware Proximity for Graph Mining Speaker: Hanghang Tong Joint work w/ Yehuda Koren, Christos Faloutsos.
1/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science
Efficient Route Computation on Road Networks Based on Hierarchical Communities Qing Song, Xiaofan Wang Department of Automation, Shanghai Jiao Tong University,
SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong Guest Lecture.
Fast Random Walk with Restart and Its Applications Hanghang Tong, Christos Faloutsos and Jia-Yu (Tim) Pan ICDM 2006 Dec , HongKong.
CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 5: Graphs over time & tensors Faloutsos,
Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad
KDD 2007, San Jose Fast Direction-Aware Proximity for Graph Mining Speaker: Hanghang Tong Joint work w/ Yehuda Koren, Christos Faloutsos.
Talk 2: Graph Mining Tools - SVD, ranking, proximity Christos Faloutsos CMU.
Panther: Fast Top-k Similarity Search in Large Networks JING ZHANG, JIE TANG, CONG MA, HANGHANG TONG, YU JING, AND JUANZI LI Presented by Moumita Chanda.
Single-Pass Belief Propagation
Kijung Shin Jinhong Jung Lee Sael U Kang
Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University
Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University.
SCS CMU Speaker Hanghang Tong Colibri: Fast Mining of Large Static and Dynamic Graphs Speaking Skill Requirement.
CS 540 Database Management Systems Web Data Management some slides are due to Kevin Chang 1.
Extrapolation to Speed-up Query- dependent Link Analysis Ranking Algorithms Muhammad Ali Norozi Department of Computer Science Norwegian University of.
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University
Large Graph Mining: Power Tools and a Practitioner’s guide
Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad
Predicting Long-Term Impact of CQA Posts: A Comprehensive Viewpoint
Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad
Large Graph Mining: Power Tools and a Practitioner’s guide
Approximating the Community Structure of the Long Tail
Speaker: Hanghang Tong Carnegie Mellon University
Jinhong Jung, Woojung Jin, Lee Sael, U Kang, ICDM ‘16
Pramod Bhatotia, Ruichuan Chen, Myungjin Lee
Asymmetric Transitivity Preserving Graph Embedding
Learning to Rank Typed Graph Walks: Local and Global Approaches
Proximity in Graphs by Using Random Walks
Presentation transcript:

Fast Random Walk with Restart and Its Applications Hanghang Tong, Christos Faloutsos and Jia-Yu (Tim) Pan ICDM 2006 Dec. 18-22, HongKong ICDM2006 Dec, 18-22, HongKong

Motivating Questions Q: How to measure the relevance? A: Random walk with restart Q: How to do it efficiently? A: This talk tries to answer! ICDM2006 Dec, 18-22, HongKong

Random walk with restart 1 4 3 2 5 6 7 9 10 8 11 12 ICDM2006 Dec, 18-22, HongKong

Random walk with restart 1 4 3 2 5 6 7 9 10 8 11 12 0.13 0.10 0.05 0.08 0.04 0.02 0.03 Node 4 Node 1 Node 2 Node 3 Node 5 Node 6 Node 7 Node 8 Node 9 Node 10 Node 11 Node 12 0.13 0.10 0.22 0.05 0.08 0.04 0.03 0.02 Nearby nodes, higher scores Ranking vector More red, more relevant ICDM2006 Dec, 18-22, HongKong

Automatic Image Caption Q { } Cat Forest Grass Tiger … { Sea Sun Sky Wave } ? {?, ?, ?,} A: RWR! [Pan KDD2004] ICDM2006 Dec, 18-22, HongKong

Region Image Keyword Test Image Sea Sun Sky Wave Cat Forest Tiger Grass Test Image Keyword ICDM2006 Dec, 18-22, HongKong

Region Image Keyword Test Image {Grass, Forest, Cat, Tiger} Sea Sun Sky Wave Cat Forest Tiger Grass Keyword

Neighborhood Formulation … … Q: what is most related conference to ICDM A: RWR! [Sun ICDM2005] … … Conference Author

NF: example

Center-Piece Subgraph(CePS) Q ? Original Graph Black: query nodes CePS A: RWR! [Tong KDD 2006] ICDM2006 Dec, 18-22, HongKong

CePS: Example ICDM2006 Dec, 18-22, HongKong

Other Applications Content-based Image Retrieval [He] Personalized PageRank [Jeh], [Widom], [Haveliwala] Anomaly Detection (for node; link) [Sun] Link Prediction [Getoor], [Jensen] Semi-supervised Learning [Zhu], [Zhou] … ICDM2006 Dec, 18-22, HongKong

Roadmap Background Basic Idea FastRWR Experimental Results Conclusion RWR: Definitions RWR: Algorithms Basic Idea FastRWR Pre-Compute Stage On-Line Stage Experimental Results Conclusion ICDM2006 Dec, 18-22, HongKong

Computing RWR n x 1 n x n n x 1 1 Restart p Starting vector Ranking vector Adjacent matrix 1 4 3 2 5 6 7 9 10 8 11 12 1 n x 1 n x n n x 1 ICDM2006 Dec, 18-22, HongKong

Fast RWR Finds the Root Solution ! Beyond RWR : Maxwell Equation for Web! [Chakrabarti] P-PageRank [Haveliwala] SM Learning [Zhou, Zhu] RL in CBIR [He] RWR [Pan, Sun] PageRank [Haveliwala] Fast RWR Finds the Root Solution ! ICDM2006 Dec, 18-22, HongKong

Q: Given query i, how to solve it?

OntheFly: Slow on-line response O(mE) 1 4 3 2 5 6 7 9 10 8 11 12 0.13 0.10 0.05 0.08 0.04 0.02 0.03 1 4 3 2 5 6 7 9 10 8 11 12 No pre-computation/ light storage Slow on-line response O(mE) ICDM2006 Dec, 18-22, HongKong

PreCompute R: [Haveliwala] 1 4 3 2 5 6 7 9 10 8 11 10 9 12 2 1 8 3 11 0.13 0.10 0.05 0.08 0.04 0.02 0.03 10 9 12 2 1 8 R: 3 11 4 6 5 7 [Haveliwala] ICDM2006 Dec, 18-22, HongKong

PreCompute: Fast on-line response Heavy pre-computation/storage cost 1 4 3 2 5 6 7 9 10 8 11 12 0.13 0.10 0.05 0.08 0.04 0.02 0.03 1 4 3 2 5 6 7 9 10 8 11 12 Fast on-line response Heavy pre-computation/storage cost O(n ) 3 O(n ) 2 ICDM2006 Dec, 18-22, HongKong

Q: How to Balance? On-line Off-line ICDM2006 Dec, 18-22, HongKong

Roadmap Background Basic Idea FastRWR Experimental Results Conclusion RWR: Definitions RWR: Algorithms Basic Idea FastRWR Pre-Compute Stage On-Line Stage Experimental Results Conclusion ICDM2006 Dec, 18-22, HongKong

Basic Idea Find Community Combine Fix the remaining 1 4 3 2 5 6 7 9 10 8 11 12 1 4 3 2 5 6 7 9 10 8 11 12 Find Community 5 6 7 9 10 8 11 12 5 6 7 9 10 8 11 12 1 4 3 2 5 6 7 9 10 8 11 12 0.13 0.10 0.05 0.08 0.04 0.02 0.03 1 4 3 2 1 4 3 2 1 4 3 2 5 6 7 9 10 8 11 12 1 4 3 2 5 6 7 9 10 8 11 12 Combine Fix the remaining

Pre-computational stage -1 Q: A: A few small, instead of ONE BIG, matrices inversions Efficiently compute and store Q ICDM2006 Dec, 18-22, HongKong

On-Line Query Stage + Q: Efficiently recover one column of Q -1 Q: Efficiently recover one column of Q A: A few, instead of MANY, matrix-vector multiplication + ICDM2006 Dec, 18-22, HongKong

Roadmap Background Basic Idea FastRWR Experimental Results Conclusion RWR: Definitions RWR: Algorithms Basic Idea FastRWR Pre-Compute Stage On-Line Stage Experimental Results Conclusion ICDM2006 Dec, 18-22, HongKong

Pre-compute Stage p1: B_Lin Decomposition p2: Q matrices P1.1 partition P1.2 low-rank approximation p2: Q matrices P2.1 computing (for each partition) P2.2 computing (for concept space) ICDM2006 Dec, 18-22, HongKong

P1.1: partition Within-partition links cross-partition links 1 4 3 2 5 6 7 9 10 8 11 12 10 9 12 2 8 1 3 11 4 6 5 7 Within-partition links cross-partition links ICDM2006 Dec, 18-22, HongKong

P1.1: block-diagonal 1 4 3 2 5 6 7 9 10 8 11 12 10 9 12 2 8 1 3 11 4 6 5 7 ICDM2006 Dec, 18-22, HongKong

P1.2: LRA for ~ |S| << |W2| 1 4 3 2 5 6 7 9 10 8 11 12 10 9 12 2 ICDM2006 Dec, 18-22, HongKong

= +

p2.1 Computing ICDM2006 Dec, 18-22, HongKong

Comparing and = Computing Time Storage Cost 100,000 nodes; 100 partitions Computing 100,00x is Faster! Storage Cost 100x saving! Q 1,1 1,2 1,k =

~ Q: How to fix the green portions? ~ + ~ + ?

p2.2 Computing: -1 Q 1,1 1,2 1,k _ U = V 1 4 3 2 5 6 7 9 10 8 11 12 ICDM2006 Dec, 18-22, HongKong

We have: SM Lemma says: Communities Bridges ICDM2006 Dec, 18-22, HongKong

Roadmap Background Basic Idea FastRWR Experimental Results Conclusion RWR: Definitions RWR: Algorithms Basic Idea FastRWR Pre-Compute Stage On-Line Stage Experimental Results Conclusion ICDM2006 Dec, 18-22, HongKong

? On-Line Stage Q + A (SM lemma) Query Result Pre-Computation ICDM2006 Dec, 18-22, HongKong

On-Line Query Stage q1: q2: q3: q4: q5: q6: ICDM2006 Dec, 18-22, HongKong

ICDM2006 Dec, 18-22, HongKong

Roadmap Background Basic Idea FastRWR Experimental Results Conclusion RWR: Definitions RWR: Algorithms Basic Idea FastRWR Pre-Compute Stage On-Line Stage Experimental Results Conclusion ICDM2006 Dec, 18-22, HongKong

Experimental Setup Dataset Approx. Quality: Relative Accuracy DBLP/authorship Author-Paper 315k nodes 1,800k edges Approx. Quality: Relative Accuracy Application: Center-Piece Subgraph ICDM2006 Dec, 18-22, HongKong

Query Time vs. Pre-Compute Time Log Query Time Quality: 90%+ On-line: Up to 150x speedup Pre-computation: Two orders saving Log Pre-compute Time ICDM2006 Dec, 18-22, HongKong

Query Time vs. Pre-Storage Log Query Time Quality: 90%+ On-line: Up to 150x speedup Pre-storage: Three orders saving Log Storage ICDM2006 Dec, 18-22, HongKong

Roadmap Background Basic Idea FastRWR Experimental Results Conclusion RWR: Definitions RWR: Algorithms Basic Idea FastRWR Pre-Compute Stage On-Line Stage Experimental Results Conclusion ICDM2006 Dec, 18-22, HongKong

Conclusion FastRWR More in the paper Reasonable quality preservation (90%+) 150x speed-up: query time Orders of magnitude saving: pre-compute & storage More in the paper The variant of FastRWR and theoretic justification Implementation details normalization, low-rank approximation, sparse More experiments Other datasets, other applications ICDM2006 Dec, 18-22, HongKong

Q&A Thank you! htong@cs.cmu.edu www.cs.cmu.edu/~htong ICDM2006 Dec, 18-22, HongKong