SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong 2008-4-10 15-826 Guest Lecture.

Slides:



Advertisements
Similar presentations
BiG-Align: Fast Bipartite Graph Alignment
Advertisements

CMU SCS I2.2 Large Scale Information Network Processing INARC 1 Overview Goal: scalable algorithms to find patterns and anomalies on graphs 1. Mining Large.
School of Computer Science Carnegie Mellon University Duke University DeltaCon: A Principled Massive- Graph Similarity Function Danai Koutra Joshua T.
Fast Algorithms for Querying and Mining Large Graphs Hanghang Tong Machine Learning Department Carnegie Mellon University
Information Networks Link Analysis Ranking Lecture 8.
Dept. of Computer Science Rutgers Node Similarity, Graph Similarity and Matching: Theory and Applications Danai Koutra (CMU) Tina Eliassi-Rad (Rutgers)
More on Rankings. Query-independent LAR Have an a-priori ordering of the web pages Q: Set of pages that contain the keywords in the query q Present the.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
Fast Query Execution for Retrieval Models based on Path Constrained Random Walks Ni Lao, William W. Cohen Carnegie Mellon University
Fast Algorithms for Querying and Mining Large Graphs Hanghang Tong Machine Learning Department Carnegie Mellon University
N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.
Fast Direction-Aware Proximity for Graph Mining KDD 2007, San Jose Hanghang Tong, Yehuda Koren, Christos Faloutsos.
SCS CMU Joint Work by Hanghang Tong, Spiros Papadimitriou, Jimeng Sun, Philip S. Yu, Christos Faloutsos Speaker: Hanghang Tong Aug , 2008, Las Vegas.
Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun.
Multimedia Databases SVD II. Optimality of SVD Def: The Frobenius norm of a n x m matrix M is (reminder) The rank of a matrix M is the number of independent.
© 2011 IBM Corporation IBM Research SIAM-DM 2011, Mesa AZ, USA, Non-Negative Residual Matrix Factorization w/ Application to Graph Anomaly Detection Hanghang.
Multimedia Databases SVD II. SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies SVD properties More case.
SCS CMU Proximity Tracking on Time- Evolving Bipartite Graphs Speaker: Hanghang Tong Joint Work with Spiros Papadimitriou, Philip S. Yu, Christos Faloutsos.
Scaling Personalized Web Search Glen Jeh, Jennfier Widom Stanford University Presented by Li-Tal Mashiach Search Engine Technology course (236620) Technion.
Measure Proximity on Graphs with Side Information Joint Work by Hanghang Tong, Huiming Qu, Hani Jamjoom Speaker: Mary McGlohon 1 ICDM 2008, Pisa, Italy15-19.
1 Fast Incremental Proximity Search in Large Graphs Purnamrita Sarkar Andrew W. Moore Amit Prakash.
Fast Random Walk with Restart and Its Applications
SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct , 2008, Napa, CA CIKM 2008.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 3: Recommendations & proximity Faloutsos,
School of Electronics Engineering and Computer Science Peking University Beijing, P.R. China Ziqi Wang, Yuwei Tan, Ming Zhang.
GDG DevFest Central Italy Joint work with J. Feldman, S. Lattanzi, V. Mirrokni (Google Research), S. Leonardi (Sapienza U. Rome), H. Lynch (Google)
School of Computer Science Carnegie Mellon LLNL, Feb. '07C. Faloutsos1 Mining static and time-evolving graphs Christos Faloutsos Carnegie Mellon University.
1 Applications of Relative Importance  Why is relative importance interesting? Web Social Networks Citation Graphs Biological Data  Graphs become too.
Random Walks and Semi-Supervised Learning Longin Jan Latecki Based on : Xiaojin Zhu. Semi-Supervised Learning with Graphs. PhD thesis. CMU-LTI ,
DATA MINING LECTURE 13 Absorbing Random walks Coverage.
2015/10/111 DBconnect: Mining Research Community on DBLP Data Osmar R. Zaïane, Jiyang Chen, Randy Goebel Web Mining and Social Network Analysis Workshop.
Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.
DATA MINING LECTURE 13 Pagerank, Absorbing Random Walks Coverage Problems.
KDD 2007, San Jose Fast Direction-Aware Proximity for Graph Mining Speaker: Hanghang Tong Joint work w/ Yehuda Koren, Christos Faloutsos.
1/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science
Fast Random Walk with Restart and Its Applications Hanghang Tong, Christos Faloutsos and Jia-Yu (Tim) Pan ICDM 2006 Dec , HongKong.
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 5: Graphs over time & tensors Faloutsos,
Tools and Algorithms for Querying and Mining Large Graphs Hanghang Tong Machine Learning Department Carnegie Mellon University
1 Panther: Fast Top-K Similarity Search on Large Networks Jing Zhang 1, Jie Tang 1, Cong Ma 1, Hanghang Tong 2, Yu Jing 1, and Juanzi Li 1 1 Department.
Guided Learning for Role Discovery (GLRD) Presented by Rui Liu Gilpin, Sean, Tina Eliassi-Rad, and Ian Davidson. "Guided learning for role discovery (glrd):
Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad
KDD 2007, San Jose Fast Direction-Aware Proximity for Graph Mining Speaker: Hanghang Tong Joint work w/ Yehuda Koren, Christos Faloutsos.
When Affinity Meets Resistance On the Topological Centrality of Edges in Complex Networks Gyan Ranjan University of Minnesota, MN [Collaborators: Zhi-Li.
Talk 2: Graph Mining Tools - SVD, ranking, proximity Christos Faloutsos CMU.
Page 1 PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi.
1 Authors: Glen Jeh, Jennifer Widom (Stanford University) KDD, 2002 Presented by: Yuchen Bian SimRank: a measure of structural-context similarity.
Panther: Fast Top-k Similarity Search in Large Networks JING ZHANG, JIE TANG, CONG MA, HANGHANG TONG, YU JING, AND JUANZI LI Presented by Moumita Chanda.
Kijung Shin Jinhong Jung Lee Sael U Kang
Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.
Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P9-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
SCS CMU Speaker Hanghang Tong Colibri: Fast Mining of Large Static and Dynamic Graphs Speaking Skill Requirement.
Paper Presentation Social influence based clustering of heterogeneous information networks Qiwei Bao & Siqi Huang.
CS 440 Database Management Systems Web Data Management 1.
CS 540 Database Management Systems Web Data Management some slides are due to Kevin Chang 1.
Finding Dense and Connected Subgraphs in Dual Networks
Large Graph Mining: Power Tools and a Practitioner’s guide
Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad
Query-Friendly Compression of Graph Streams
Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad
Large Graph Mining: Power Tools and a Practitioner’s guide
Speaker: Hanghang Tong Carnegie Mellon University
Jinhong Jung, Woojung Jin, Lee Sael, U Kang, ICDM ‘16
Asymmetric Transitivity Preserving Graph Embedding
Learning to Rank Typed Graph Walks: Local and Global Approaches
Proximity in Graphs by Using Random Walks
Analysis of Large Graphs: Overlapping Communities
Presentation transcript:

SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong Guest Lecture

SCS CMU 2 Graphs are everywhere!

SCS CMU 3 Food-web: example

SCS CMU 4 Graph Mining: the big picture Graph/Global Level Subgraph/ Community Level Node Level We are here!

SCS CMU 5 Proximity on Graph: What? a.k.a Relevance, Closeness, ‘Similarity’…

SCS CMU 6 Proximity is the main tool behind… Link prediction [Liben-Nowell+], [Tong+] Ranking [Haveliwala], [Chakrabarti+] Management [Minkov+] Image caption [Pan+] Neighborhooh Formulation [Sun+] Conn. subgraph [Faloutsos+], [Tong+], [Koren+] Pattern match [Tong+] Collaborative Filtering [Fouss+] Many more… Will return to this later

SCS CMU 7 Roadmap Motivation Part I: Definitions Part II: Fast Solutions Part III: Applications Conclusion Basic: RWR Variants Asymmetry of Prox. Group Prox Prox w/ Attributes Prox w/ Time

SCS CMU 8 Why not shortest path? ‘pizza delivery guy’ problem ‘multi-facet’ relationship Some ``bad’’ proximities

SCS CMU 9 Why not max. netflow? No punishment on long paths Some ``bad’’ proximities

SCS CMU 10 Why not ``effective conductance”? Some ``bad’’ proximities ‘pizza delivery guy’ problem

SCS CMU 11 What is a ``good’’ Proximity? Multiple Connections Quality of connection Direct & In-directed Conns Length, Degree, Weight… …

SCS CMU Random walk with restart

SCS CMU 13 Random walk with restart Node 4 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Node 8 Node 9 Node 10 Node 11 Node Ranking vector More red, more relevant Nearby nodes, higher scores

SCS CMU Why RWR is a good score? 14 all paths from i to j with length 1 all paths from i to j with length 2 all paths from i to j with length 3

SCS CMU 15 Roadmap Motivation Part I: Definitions Part II: Fast Solutions Part III: Applications Conclusion Basic: RWR Variants Asymmetry of Prox. Group Prox Prox w/ Attributes Prox w/ Time

SCS CMU 16 Variant: escape probability Define Random Walk (RW) on the graph Esc_Prob(A  B) –Prob (starting at A, reaches B before returning to A) Esc_Prob = Pr (smile before cry) A B the remaining graph

SCS CMU 17 Other Variants Other measure by RWs –Community Time/Hitting Time [Fouss+] –SimRank [Jeh+] Equivalence of Random Walks –Electric Networks: EC [Doyle+]; SAEC[Faloutsos+]; CFEC[Koren+] –String Systems Katz [Katz], [Huang+], [Scholkopf+] Matrix-Forest-based Alg [Chobotarev+]

SCS CMU 18 Other Variants Other measure by RWs –Community Time/Hitting Time [Fouss+] –SimRank [Jeh+] Equivalence of Random Walks –Electric Networks: EC [Doyle+]; SAEC[Faloutsos+]; CFEC[Koren+] –String Systems Katz [Katz], [Huang+], [Scholkopf+] Matrix-Forest-based Alg [Chobotarev+] All are related to, or similar to random walk with restart!

SCS CMU 19 Roadmap Motivation Part I: Definitions Part II: Fast Solutions Part III: Applications Conclusion Basic: RWR Variants Asymmetry of Prox. Group Prox Prox w/ Attributes Prox w/ Time

SCS CMU 20 Asymmetry of Proximity [Tong+ KDD07 a] What is Prox from A to B? What is Prox from B to A? What is Prox between A and B?

SCS CMU 21 Asymmetry also exists in un-directed graphs Hanghang’s most important conf. is KDD The most important author in KDD is... So is love… Hanghang KDD

SCS CMU 22 Roadmap Motivation Part I: Definitions Part II: Fast Solutions Part III: Applications Conclusion Basic: RWR Variants Asymmetry of Prox. Group Prox Prox w/ Attributes Prox w/ Time

SCS CMU 23 Group Proximity [Tong+ 2007] Q: How close are Accountants to SECs? A: Prob (starting at any RED, reaches any GREEN before touching any RED again)

SCS CMU 24 Proximity on Attribute Graphs What is the proximity from node 7 to 10? If we know that…

SCS CMU 25 Sol: Augmented graphs

SCS CMU 26 Attributes on nodes/edges (ER graph) [Chakrabarti+ WWW07] skip WroteSentReceived In-Replied-toCited Works

SCS CMU 27 Proximity w/ Time Sol #1: treat time an categorical attr. [Minkov+] Sol #2: aggregate slice matrices [Tong+] Time Global aggregation Slide window Exponential emphasis

SCS CMU 28 Summary of Part I Goal: Summarize multiple … relationships Solutions –Basic: Random Walk with Restart –Property: Asymmetry –Variants: Esc_Prob and many others. –Generalization: Group Prox.; w/ Attr.; w/ Time

SCS CMU 29 Motivation Part I: Definitions Part II: Fast Solutions Part III: Applications Conclusion Roadmap B_Lin: RWR FastAllDAP: Esc_Prob BB_Lin: Skewed BGs FastUpdate: Time-Evolving

SCS CMU Preliminary: Sherman–Morrison Lemma 30 = If: Then:

SCS CMU SM Lemma: Applications RLS –and almost any algorithm in time series! Leave-one-out cross validation for LS Kalman filtering Incremental matrix decomposition … and all the fast sols we will introduce! 31

SCS CMU 32 Computing RWR n x n n x 1 Ranking vector Starting vector Adjacent matrix 1 Restart p

SCS CMU 33 Beyond RWR P-PageRank [Haveliwala] PageRank [Haveliwala] RWR [Pan, Sun] SM Learning [Zhou, Zhu] RL in CBIR [He] Fast RWR (B_Lin) Finds the Root Solution ! : Maxwell Equation for Web! [Chakrabarti]

SCS CMU 34 RWR is the building block for computing… –Escape Probability (augmented w/sink) [Tong+] –..  Effective Conductanc  Resistance Dist.  Commute Time –MRF (special structure) [Cohen] Similar Idea of B_Lin to compute other measurements Beyond RWR

SCS CMU 35 Q: Given query i, how to solve it? ? ? Adjacent matrix Starting vector

SCS CMU OntheFly: No pre-computation/ light storage Slow on-line response O(mE)

SCS CMU 37 4 PreCompute [Haveliwala] R:R:

SCS CMU 38 PreCompute: Fast on-line response Heavy pre-computation/storage cost O(n ) 3 2

SCS CMU 39 Q: How to Balance? On-line Off-line

SCS CMU 40 B_Lin: Basic Idea [Tong+] Find Community Fix the remaining Combine

SCS CMU 41 Pre-computational stage Q: A: A few small, instead of ONE BIG, matrices inversions Efficiently compute and store Q

SCS CMU 42 Q: Efficiently recover one column of Q A: A few, instead of MANY, matrix-vector multiplication On-Line Query Stage +

SCS CMU 43 Pre-compute Stage p1: B_Lin Decomposition –P1.1 partition –P1.2 low-rank approximation p2: Q matrices –P2.1 computing (for each partition) –P2.2 computing (for concept space)

SCS CMU 44 P1.1: partition Within-partition linkscross-partition links skip

SCS CMU 45 P1.1: block-diagonal skip

SCS CMU 46 P1.2: LRA for |S| << |W 2 | ~ skip

SCS CMU 47 + = skip

SCS CMU 48 p2.1 Computing c skip

SCS CMU 49 Comparing and Computing Time –100,000 nodes; 100 partitions –Computing 100,00x is Faster! Storage Cost –100x saving! Q 1,1 Q 1,2 Q 1,k = skip

SCS CMU 50 Q: How to fix the green portions? + ~ ~ ~ + ? skip

SCS CMU 51 p2.2 Computing: U V = _ Q 1,1 Q 1,2 Q 1,k skip

SCS CMU 52 SM Lemma says: We have: Communities Bridges skip

SCS CMU 53 On-Line Stage Q + Query Result ? A (SM lemma) Pre-Computation skip

SCS CMU 54 On-Line Query Stage q1: q2: q3: q4: q5: q6: skip

SCS CMU 55 skip

SCS CMU 56 Query Time vs. Pre-Compute Time Log Query Time Log Pre-compute Time Quality: 90%+ On-line: Up to 150x speedup Pre-computation: Two orders saving

SCS CMU 57 Motivation Part I: Definitions Part II: Fast Solutions Part III: Applications Conclusion Roadmap B_Lin: RWR FastAllDAP: Esc_Prob BB_Lin: Skewed BGs FastUpdate: Time-Evolving

SCS CMU 58 FastAllDAP [Tong+] Footnote: augmented w/ universal sink as practical modification A B the remaining graph Q: How to compute –Esc_Prob = Pr (smile before cry)?

SCS CMU 59 Solving DAP (Straight-forward way) One matrix inversion, one proximity! 1 x (n-2) (n-2) x (n-2) 1-c: fly-out probability (to black-hole)

SCS CMU 60 Esc_Prob(1->5) = P= I - + P: Transition matrix (row norm.) 2 c c

SCS CMU 61 Case 1, Medium Size Graph –Matrix inversion is feasible, but… –What if we want many proximities? –Q: How to get all (n ) proximities efficiently? –A: FastAllDAP! Case 2: Large Size Graph –Matrix inversion is infeasible –Q: How to get one proximity efficiently? –A: FastOneDAP! Challenges 2 skip

SCS CMU 62 FastAllDAP Q1: How to efficiently compute all possible proximities on a medium size graph? –a.k.a. how to efficiently solve multiple linear systems simultaneously? Goal: reduce # of matrix inversions!

SCS CMU 63 FastAllDAP: Observation Need two different matrix inversions! P=

SCS CMU 64 FastAllDAP: Rescue Redundancy among different linear systems! P= Overlap between two gray parts! Prox(1  5) Prox(1  6)

SCS CMU 65 FastAllDAP: Theorem Theorem: Proof: by SM Lemma Example:

SCS CMU 66 FastAllDAP: Algorithm Alg. –Compute Q –For i,j =1,…, n, compute Computational Save O(1) instead of O(n )! Example –w/ 1000 nodes, –1m matrix inversion vs. 1 matrix! 2

SCS CMU 67 FastAllDAP Size of Graph Time (sec) Straight-Solver FastAllDAP 1,000x faster!

SCS CMU 68 Motivation Part I: Definitions Part II: Fast Solutions Part III: Applications Conclusion Roadmap B_Lin: RWR FastAllDAP: Esc_Prob BB_Lin: Skewed BGs FastUpdate: Time-Evolving

SCS CMU RWR on Bipartite Graph 69 n m authors Conferences Author-Conf. Matrix Observation: n >> m! Examples: 1. DBLP: 400k aus, 3.5k confs 2. NetFlix: 2.7M usrs, 18k mvs

SCS CMU 70 Q: Given query i, how to solve it? RWR on Skewed bipartite graphs ? ? ….... ….. … n m Ar ….... ….. …... Ac

SCS CMU Step 1: Step 2: Cost: Examples –NetFlix: 1.5hr for pre-computation; –DBLP: 1 few minutes 71 BB_Lin: Pre-Computation [Tong+ 06] M = Ac Ar X 2-step RWR for Conferences All Conf-Conf Prox. Scores

SCS CMU 72 BB_Lin: Pre-Computation [Tong+ 06] Step 1: Step 2: M = Ac Ar X 2-step RWR for Conferences All Conf-Conf Prox. Scores

SCS CMU 73 BB_Lin: Pre-Computation [Tong+ 06] Step 1: Step 2: Cost: Examples –NetFlix: 1.5hr for pre-computation; –DBLP: 1 few minutes M = Ac Ar X 2-step RWR for Conferences All Conf-Conf Prox. Scores Ac/Ar E edges m x m

SCS CMU BB_Lin: On-Line Stage 74 Ac/Ar E edges Case 1: - Conf - Conf authors Conferences Read out !

SCS CMU BB_Lin: On-Line Stage 75 Ac/Ar E edges Case 2: - Au - Conf authors Conferences 1 matrix-vec!

SCS CMU BB_Lin: On-Line Stage 76 Ac/Ar E edges Case 3: - Au - Au authors Conferences 2 m atrix-vec!

SCS CMU BB_Lin: Examples NetFlix dataset (2.7m user x 18k movies) –1.5hr for pre-computation; –<1 sec for on-line DBLP dataset (400k authors x 3.5k confs) –A few minutes for pre-computation –<0.01 sec for on-line 77

SCS CMU 78 Motivation Part I: Definitions Part II: Fast Solutions Part III: Applications Conclusion Roadmap B_Lin: RWR FastAllDAP: Esc_Prob BB_Lin: Skewed BGs FastUpdate: Time-Evolving

SCS CMU 79 Challenges BB_Lin is good for skewed bipartite graphs –for NetFlix (2.7M nodes and 100M edges) –w/ 1.5 hr pre-computation for m x m core matrix –fraction of seconds for on-line query But…what if the graph is evolving over time –New edges/nodes arrive; edge weights increase… –1.5hr itself becomes a part of on-line cost!

SCS CMU 80 t=0 Q: How to update the core matrix? t=1 ~ ~ ?

SCS CMU Update the core matrix Step 1: Step 2: 81 M = Ac Ar X ~ ~ ~ ? M = X + Rank 2 update = + X

SCS CMU Update : General Case [Tong+ 2008] E’ edges changed Involves n’ authors, m’ confs. Observation 82 M = Ac Ar X ~ n authors m Conferences

SCS CMU 83 Observation: –the rank of update is small! Algorithm: –E’ edges changed –Involves n’ authors, m’ confs. –our Alg. –(details in the paper) Update : General Case 83 n authors m Conferences

SCS CMU 84 FastOneUpdate 176x speedup 40x speedup Time (Seconds) Datasets

SCS CMU 85 Fast-Batch-Update Min (n’, m’)E’ Time (Seconds) 15x speed-up on average!

SCS CMU 86 Summary of Part II Goal: Efficiently Solve Linear System(s) Sols. –B_Lin: Approximate one large linear system –FastAllDAP: multiple inner-related linear systems –BB_Lin: the intrinsic complexity is small –FastUpdate: (smooth) dynamic linear system

SCS CMU 87 B_Lin FastAllDAP … BB_Lin … FastUpdate

SCS CMU 88 Motivation Part I: Definitions Part II: Fast Solutions Part III: Applications Conclusion Roadmap Link Prediction NF gCap CePS G-Ray pTrack/cTrack

SCS CMU 89 Link Prediction: existence no link with link density Prox (i  j)+Prox (j  i) Prox. is effective to distinguish red and blue!

SCS CMU 90 Link Prediction: direction Q: Given the existence of the link, what is the direction of the link? A: Compare prox(i  j) and prox(j  i) >70% Prox (i  j) - Prox (j  i) density

SCS CMU 91 Neighborhood Formulation … … … … ConferenceAuthor A: RWR! [Sun ICDM2005] Q: what is most related conference to ICDM

SCS CMU 92 NF: example

SCS CMU 93 gCaP: Automatic Image Caption Q … SeaSunSkyWave {} {} CatForestGrassTiger {?, ?, ?,} ? A: RWR! [Pan KDD2004]

SCS CMU 94 Test Image SeaSunSkyWaveCatForestTigerGrass Image Keyword Region

SCS CMU 95 Test Image SeaSunSkyWaveCatForestTigerGrass Image Keyword Region {Grass, Forest, Cat, Tiger}

SCS CMU 96 Center-Piece Subgraph(CePS) ? Original Graph Black: query nodes CePS Q A: RWR! [Tong KDD 2006] Red: Max (Prox(Red, A) x Prox(Red, B) x Prox(Red, C)) CePS guy

SCS CMU 97 CePS: Example

SCS CMU 98 K_SoftAnd: Relaxation of AND Asking AND query?  No Answer! Disconnected Communities Noise

SCS CMU 99 2_SoftAnd And 1_SoftAnd (OR) x 1e-4

SCS CMU 100 CePS: 2 Soft_AND Stat. DB

SCS CMU 101 OutputInput Attributed Data Graph Query Graph Matching Subgraph Graph X-Ray

SCS CMU 102 G-Ray: How to? matching node Goodness = Prox (12, 4) x Prox (4, 12) x Prox (7, 4) x Prox (4, 7) x Prox (11, 7) x Prox (7, 11) x Prox (12, 11) x Prox (11, 12)

SCS CMU 103 Effectiveness: star-query Query Result

SCS CMU 104 Effectiveness: line-query Query Result

SCS CMU 105 Query Result Effectiveness: loop-query

SCS CMU 106 pTrack [Given] –(1) a large, skewed time-evolving bipartite graphs, –(2) the query nodes of interest [Track] –(1) top-k most related nodes for each query node at each time step t; –(2) the proximity score (or rank of proximity) between any two query nodes at each time step t Author A’ Rank in KDD Year

SCS CMU 107 Philip S. Yu’s Top-5 conferences up to each year ICDE ICDCS SIGMETRICS PDIS VLDB CIKM ICDCS ICDE SIGMETRICS ICMCS KDD SIGMOD ICDM CIKM ICDCS ICDM KDD ICDE SDM VLDB Databases Performance Distributed Sys. Databases Data Mining

SCS CMU 108 KDD’s Rank wrt. VLDB over years Rank Year Data Mining and Databases are more and more relavant!

SCS CMU 109 cTrack [Given] –(1) a large, skewed time-evolving graphs, –(2) the query nodes of interest [Track] –(1) top-k most central nodes at each time step t; –(2) the centrality score (or rank of centrality) for each query node at each time step t

SCS CMU 110 Ranking of Centrality up to each year (in NIPS) M. Jordan G.Hinton C. Koch T. Sejnowski Year Rank of Influential-ness

SCS CMU most influential authors up to each year Author-paper bipartite graph from NIPS k papers, 2037 authors, spreading over 13 years T. Sejnowski M. Jordan

SCS CMU 112 RWR Variantsw/ Time w/ Attribute Group Porx. Definitions B_Lin FastAllDAP BB_Lin FastUpdate Computations Link Prediction NF gCap CePS G-Ray pTrack cTrack Applications Proximity On Graphs Weighted Multiple Relationship Efficiently Solve Linear System(s) Use Proximity as Building block

SCS CMU Take-home Messages Proximity Definitions –RWR –and a lot of variants Computations –SM Lemma 113

SCS CMU References L. Page, S. Brin, R. Motwani, & T. Winograd. (1998), The PageRank Citation Ranking: Bringing Order to the Web, Technical report, Stanford Library. T.H. Haveliwala. (2002) Topic-Sensitive PageRank. In WWW, , 2002 J.Y. Pan, H.J. Yang, C. Faloutsos & P. Duygulu. (2004) Automatic multimedia cross-modal correlation discovery. In KDD, , C. Faloutsos, K. S. McCurley & A. Tomkins. (2002) Fast discovery of connection subgraphs. In KDD, , J. Sun, H. Qu, D. Chakrabarti & C. Faloutsos. (2005) Neighborhood Formation and Anomaly Detection in Bipartite Graphs. In ICDM, , W. Cohen. (2007) Graph Walks and Graphical Models. Draft. 114

SCS CMU References P. Doyle & J. Snell. (1984) Random walks and electric networks, volume 22. Mathematical Association America, New York. Y. Koren, S. C. North, and C. Volinsky. (2006) Measuring and extracting proximity in networks. In KDD, 245–255, A. Agarwal, S. Chakrabarti & S. Aggarwal. (2006) Learning to rank networked entities. In KDD, 14-23, S. Chakrabarti. (2007) Dynamic personalized pagerank in entity-relation graphs. In WWW, , F. Fouss, A. Pirotte, J.-M. Renders, & M. Saerens. (2007) Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation. IEEE Trans. Knowl. Data Eng. 19(3),

SCS CMU References H. Tong & C. Faloutsos. (2006) Center-piece subgraphs: problem definition and fast solutions. In KDD, , H. Tong, C. Faloutsos, & J.Y. Pan. (2006) Fast Random Walk with Restart and Its Applications. In ICDM, , H. Tong, Y. Koren, & C. Faloutsos. (2007) Fast direction- aware proximity for graph mining. In KDD, , H. Tong, B. Gallagher, C. Faloutsos, & T. Eliassi-Rad. (2007) Fast best-effort pattern matching in large attributed graphs. In KDD, , H. Tong, S. Papadimitriou, P.S. Yu & C. Faloutsos. (2008) Proximity Tracking on Time-Evolving Bipartite Graphs. to appear in SDM

SCS CMU 117 Thank you!