Fast Direction-Aware Proximity for Graph Mining KDD 2007, San Jose Hanghang Tong, Yehuda Koren, Christos Faloutsos.

Slides:



Advertisements
Similar presentations
Lecture 19: Parallel Algorithms
Advertisements

School of Computer Science Carnegie Mellon University Duke University DeltaCon: A Principled Massive- Graph Similarity Function Danai Koutra Joshua T.
Exact Inference in Bayes Nets
Link Analysis: PageRank
Asymptotic error expansion Example 1: Numerical differentiation –Truncation error via Taylor expansion.
More on Rankings. Query-independent LAR Have an a-priori ordering of the web pages Q: Set of pages that contain the keywords in the query q Present the.
Experiments with MATLAB Experiments with MATLAB Google PageRank Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University, Taiwan
Infinite Horizon Problems
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.
SCS CMU Joint Work by Hanghang Tong, Spiros Papadimitriou, Jimeng Sun, Philip S. Yu, Christos Faloutsos Speaker: Hanghang Tong Aug , 2008, Las Vegas.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Multimedia Databases SVD II. Optimality of SVD Def: The Frobenius norm of a n x m matrix M is (reminder) The rank of a matrix M is the number of independent.
Multimedia Databases SVD II. SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies SVD properties More case.
SCS CMU Proximity Tracking on Time- Evolving Bipartite Graphs Speaker: Hanghang Tong Joint Work with Spiros Papadimitriou, Philip S. Yu, Christos Faloutsos.
Measuring and Extracting Proximity in Networks By - Yehuda Koren, Stephen C.North and Chris Volinsky - Rahul Sehgal.
Scaling Personalized Web Search Glen Jeh, Jennfier Widom Stanford University Presented by Li-Tal Mashiach Search Engine Technology course (236620) Technion.
Answering Distance Queries in directed graphs using fast matrix multiplication Seminar in Algorithms Prof. Haim Kaplan Lecture by Lior Eldar 1/07/2007.
Code and Decoder Design of LDPC Codes for Gbps Systems Jeremy Thorpe Presented to: Microsoft Research
Fast Random Walk with Restart and Its Applications
Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with.
Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405.
Random Walks and Semi-Supervised Learning Longin Jan Latecki Based on : Xiaojin Zhu. Semi-Supervised Learning with Graphs. PhD thesis. CMU-LTI ,
DATA MINING LECTURE 13 Absorbing Random walks Coverage.
July The Mathematical Challenge of Large Networks László Lovász Eötvös Loránd University, Budapest
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
Clustering Spatial Data Using Random Walk David Harel and Yehuda Koren KDD 2001.
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved Graphs.
DATA MINING LECTURE 13 Pagerank, Absorbing Random Walks Coverage Problems.
KDD 2007, San Jose Fast Direction-Aware Proximity for Graph Mining Speaker: Hanghang Tong Joint work w/ Yehuda Koren, Christos Faloutsos.
1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong Guest Lecture.
Fast Random Walk with Restart and Its Applications Hanghang Tong, Christos Faloutsos and Jia-Yu (Tim) Pan ICDM 2006 Dec , HongKong.
Lecture 7 All-Pairs Shortest Paths. All-Pairs Shortest Paths.
1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad
The mathematical challenge of large networks László Lovász Eötvös Loránd University, Budapest Joint work with Christian Borgs, Jennifer Chayes, Balázs.
Programming Practicum Day 3: Problem Solving with Graphs Aaron Tan NUS School of Computing.
Direct Methods for Sparse Linear Systems Lecture 4 Alessandra Nardi Thanks to Prof. Jacob White, Suvranu De, Deepak Ramaswamy, Michal Rewienski, and Karen.
KDD 2007, San Jose Fast Direction-Aware Proximity for Graph Mining Speaker: Hanghang Tong Joint work w/ Yehuda Koren, Christos Faloutsos.
Decision Making Under Uncertainty CMSC 471 – Spring 2041 Class #25– Tuesday, April 29 R&N, material from Lise Getoor, Jean-Claude Latombe, and.
Limits of randomly grown graph sequences Katalin Vesztergombi Eötvös University, Budapest With: Christian Borgs, Jennifer Chayes, László Lovász, Vera Sós.
Graphs Basic properties.
Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.
Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University
Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University.
Online Social Networks and Media Absorbing random walks Label Propagation Opinion Formation.
A Parallel Hierarchical Solver for the Poisson Equation Seung Lee Deparment of Mechanical Engineering
SCS CMU Speaker Hanghang Tong Colibri: Fast Mining of Large Static and Dynamic Graphs Speaking Skill Requirement.
Theory of Computational Complexity Probability and Computing Lee Minseon Iwama and Ito lab M1 1.
Web Mining Link Analysis Algorithms Page Rank. Ranking web pages  Web pages are not equally “important” v  Inlinks.
Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University
A Peta-Scale Graph Mining System
COMPUTATIONAL MODELS.
Sofus A. Macskassy Fetch Technologies
Solving Linear Systems Ax=b
The minimum cost flow problem
PEGASUS: A PETA-SCALE GRAPH MINING SYSTEM
DTMC Applications Ranking Web Pages & Slotted ALOHA
Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad
Finding Heuristics Using Abstraction
Large Graph Mining: Power Tools and a Practitioner’s guide
Enumerating Distances Using Spanners of Bounded Degree
Lecture 7 All-Pairs Shortest Paths
Piyush Kumar (Lecture 2: PageRank)
Graph Clustering based on Random Walk
Solving Linear Systems: Iterative Methods and Sparse Systems
Learning to Rank Typed Graph Walks: Local and Global Approaches
Proximity in Graphs by Using Random Walks
Presentation transcript:

Fast Direction-Aware Proximity for Graph Mining KDD 2007, San Jose Hanghang Tong, Yehuda Koren, Christos Faloutsos

2 Defining Direction-Aware Proximity (DAP): escape probability Define Random Walk (RW) on the graph Esc_Prob(A  B) – Prob (starting at A, reaches B before returning to A) Esc_Prob = Pr (smile before cry) A B the remaining graph

3 Esc_Prob(1->5) = P= I - + P: Transition matrix (row norm.)

Intuition of Formula P*P=

5 Esc_Prob(1->5) = P= I - + P: Transition matrix (row norm.)

6 Case 1, Medium Size Graph – Matrix inversion is feasible, but… – What if we want many proximities? – Q: How to get all (n ) proximities efficiently? – A: FastAllDAP! Case 2: Large Size Graph – Matrix inversion is infeasible – Q: How to get one proximity efficiently? – A: FastOneDAP! Challenges 2

7 FastAllDAP Q1: How to efficiently compute all possible proximities on a medium size graph? – a.k.a. how to efficiently solve multiple linear systems simultaneously? Goal: reduce # of matrix inversions!

8 FastAllDAP: Observation Need two different matrix inversions! P=

9 FastAllDAP: Rescue Redundancy among different linear systems! P= Overlap between two gray parts! Prox(1  5) Prox(1  6)

10 FastAllDAP: Theorem Theorem: Proof: by SM Lemma Example:

11 FastAllDAP: Algorithm Alg. – Compute Q – For i,j =1,…, n, compute Computational Save O(1) instead of O(n )! Example – w/ 1000 nodes, – 1m matrix inversion vs. 1 matrix! 2

12 FastOneDAP Q1: How to efficiently compute one single proximity on a large size graph? – a.k.a. how to solve one linear system efficiently? Goal: avoid matrix inversion!

13 FastOneDAP: Observation Partial Info. (4 elements /2 cols ) of Q is enough!

14 FastOneDAP: Observation Q: How to compute one column of Q? A: Taylor expansion Reminder: i col of Q th [0, …0, 1, 0, …, 0] T

15 FastOneDAP: Observation xxx Sparse matrix-vector multiplications! …. i col of Q th [0, …0, 1, 0, …, 0] T

16 FastOneDAP: Iterative Alg. Alg. to estimate i Col of Q th

17 FastOneDAP: Property Convergence Guaranteed ! Computational Save – Example: 100K nodes and 1M edges (50 Iterations) 10,000,000x fast! Footnote: 1 col is enough! – (details in paper)

18 Esc_Prob is good, but… Issue #1: – `Degree-1 node’ effect Issue #2: – Weakly connected pair Need some practical modifications!

19 Issue#1: `degree-1 node’ effect [Faloutsos+] [Koren+] no influence for degree-1 nodes (E, F)! – known as ‘pizza delivery guy’ problem in undirected graph Solutions: Universal Absorbing Boundary! Esc_Prob(a->b)=1

20 Universal Absorbing Boundary U-A-B is a black-hole! Footnote: fly-out probability = 0.1

21 Introducing Universal-Absorbing-Boundary Prox(a->b)=0.91 Prox(a->b)=0.74 Footnote: fly-out probability = 0.1 Esc_Prob(a->b)=1

22 Issue#2: Weakly connected pair Prox(A  B) = Prox (B  A)=0 Solution: Partial symmetry!

23 Practical Modifications: Partial Symmetry Prox(A  B) = Prox (B  A)=0 Prox(A  B) =0.081 > Prox (B  A)=0.009

24 Efficiency: FastAllDAP Size of Graph Time (sec) Straight-Solver FastAllDAP 1,000x faster!

25 Efficiency: FastOneDAP Size of Graph Time (sec) FastOneDAP Straight-Solver 1,0000x faster!

26 Link Prediction: existence DatasetAccuracy DAPUDAP WL65.40% PC79.60%80.78% AE81.51%80.60% CN86.71%84.00% EP92.21%92.09%

27 Link Prediction: direction Q: Given the existence of the link, what is the direction of the link? A: Compare prox(i  j) and prox(j  i) >70% Prox (i  j) - Prox (j  i) density