CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 3: Recommendations & proximity Faloutsos,

Slides:



Advertisements
Similar presentations
CMU SCS : Multimedia Databases and Data Mining Lecture #20: SVD - part III (more case studies) C. Faloutsos.
Advertisements

BiG-Align: Fast Bipartite Graph Alignment
CMU SCS I2.2 Large Scale Information Network Processing INARC 1 Overview Goal: scalable algorithms to find patterns and anomalies on graphs 1. Mining Large.
Fast Algorithms for Querying and Mining Large Graphs Hanghang Tong Machine Learning Department Carnegie Mellon University
Dept. of Computer Science Rutgers Node and Graph Similarity : Theory and Applications Danai Koutra (CMU) Tina Eliassi-Rad (Rutgers) Christos Faloutsos.
Dept. of Computer Science Rutgers Node Similarity, Graph Similarity and Matching: Theory and Applications Danai Koutra (CMU) Tina Eliassi-Rad (Rutgers)
1 Social Influence Analysis in Large-scale Networks Jie Tang 1, Jimeng Sun 2, Chi Wang 1, and Zi Yang 1 1 Dept. of Computer Science and Technology Tsinghua.
Absorbing Random walks Coverage
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
CMU SCS C. Faloutsos (CMU)#1 Large Graph Algorithms Christos Faloutsos CMU McGlohon, Mary Prakash, Aditya Tong, Hanghang Tsourakakis, Babis Akoglu, Leman.
Fast Query Execution for Retrieval Models based on Path Constrained Random Walks Ni Lao, William W. Cohen Carnegie Mellon University
N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.
Fast Direction-Aware Proximity for Graph Mining KDD 2007, San Jose Hanghang Tong, Yehuda Koren, Christos Faloutsos.
SCS CMU Joint Work by Hanghang Tong, Spiros Papadimitriou, Jimeng Sun, Philip S. Yu, Christos Faloutsos Speaker: Hanghang Tong Aug , 2008, Las Vegas.
Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun.
Multimedia Databases SVD II. Optimality of SVD Def: The Frobenius norm of a n x m matrix M is (reminder) The rank of a matrix M is the number of independent.
Multimedia Databases SVD II. SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies SVD properties More case.
Iterative Set Expansion of Named Entities using the Web Richard C. Wang and William W. Cohen Language Technologies Institute Carnegie Mellon University.
SCS CMU Proximity Tracking on Time- Evolving Bipartite Graphs Speaker: Hanghang Tong Joint Work with Spiros Papadimitriou, Philip S. Yu, Christos Faloutsos.
Scaling Personalized Web Search Glen Jeh, Jennfier Widom Stanford University Presented by Li-Tal Mashiach Search Engine Technology course (236620) Technion.
C-DEM: A Multi-Modal Query System for Drosophila Embryo Databases Fan Guo, Lei Li, Eric Xing, Christos Faloutsos Carnegie Mellon University {fanguo, leili,
1 Fast Dynamic Reranking in Large Graphs Purnamrita Sarkar Andrew Moore.
Measure Proximity on Graphs with Side Information Joint Work by Hanghang Tong, Huiming Qu, Hani Jamjoom Speaker: Mary McGlohon 1 ICDM 2008, Pisa, Italy15-19.
1 Fast Incremental Proximity Search in Large Graphs Purnamrita Sarkar Andrew W. Moore Amit Prakash.
The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.
Fast Random Walk with Restart and Its Applications
SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct , 2008, Napa, CA CIKM 2008.
School of Electronics Engineering and Computer Science Peking University Beijing, P.R. China Ziqi Wang, Yuwei Tan, Ming Zhang.
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P0-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
DATA MINING LECTURE 13 Absorbing Random walks Coverage.
1 Random Walks on Graphs: An Overview Purnamrita Sarkar, CMU Shortened and modified by Longin Jan Latecki.
2015/10/111 DBconnect: Mining Research Community on DBLP Data Osmar R. Zaïane, Jiyang Chen, Randy Goebel Web Mining and Social Network Analysis Workshop.
DATA MINING LECTURE 13 Pagerank, Absorbing Random Walks Coverage Problems.
KDD 2007, San Jose Fast Direction-Aware Proximity for Graph Mining Speaker: Hanghang Tong Joint work w/ Yehuda Koren, Christos Faloutsos.
Mining Social Networks for Personalized Prioritization Shinjae Yoo, Yiming Yang, Frank Lin, II-Chul Moon [KDD ’09] 1 Advisor: Dr. Koh Jia-Ling Reporter:
SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong Guest Lecture.
Fast Random Walk with Restart and Its Applications Hanghang Tong, Christos Faloutsos and Jia-Yu (Tim) Pan ICDM 2006 Dec , HongKong.
CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 5: Graphs over time & tensors Faloutsos,
Tools and Algorithms for Querying and Mining Large Graphs Hanghang Tong Machine Learning Department Carnegie Mellon University
Guided Learning for Role Discovery (GLRD) Presented by Rui Liu Gilpin, Sean, Tina Eliassi-Rad, and Ian Davidson. "Guided learning for role discovery (glrd):
Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad
KDD 2007, San Jose Fast Direction-Aware Proximity for Graph Mining Speaker: Hanghang Tong Joint work w/ Yehuda Koren, Christos Faloutsos.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
Talk 2: Graph Mining Tools - SVD, ranking, proximity Christos Faloutsos CMU.
Page 1 PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi.
1 Authors: Glen Jeh, Jennifer Widom (Stanford University) KDD, 2002 Presented by: Yuchen Bian SimRank: a measure of structural-context similarity.
Panther: Fast Top-k Similarity Search in Large Networks JING ZHANG, JIE TANG, CONG MA, HANGHANG TONG, YU JING, AND JUANZI LI Presented by Moumita Chanda.
Kijung Shin Jinhong Jung Lee Sael U Kang
Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P9-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P8-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 8: hadoop and Tera/Peta byte graphs.
SCS CMU Speaker Hanghang Tong Colibri: Fast Mining of Large Static and Dynamic Graphs Speaking Skill Requirement.
Arizona State University Fast Eigen-Functions Tracking on Dynamic Graphs Chen Chen and Hanghang Tong - 1 -
CS 540 Database Management Systems Web Data Management some slides are due to Kevin Chang 1.
GRAPH AND LINK MINING 1. Graphs - Basics 2 Undirected Graphs Undirected Graph: The edges are undirected pairs – they can be traversed in any direction.
Extrapolation to Speed-up Query- dependent Link Analysis Ranking Algorithms Muhammad Ali Norozi Department of Computer Science Norwegian University of.
Topics In Social Computing (67810) Module 1 (Structure) Centrality Measures, Graph Clustering Random Walks on Graphs.
Large Graph Mining: Power Tools and a Practitioner’s guide
Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad
Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad
Large Graph Mining: Power Tools and a Practitioner’s guide
Speaker: Hanghang Tong Carnegie Mellon University
Jinhong Jung, Woojung Jin, Lee Sael, U Kang, ICDM ‘16
Graph and Tensor Mining for fun and profit
Large Graph Mining: Power Tools and a Practitioner’s guide
Learning to Rank Typed Graph Walks: Local and Global Approaches
Proximity in Graphs by Using Random Walks
PRSim: Sublinear Time SimRank Computation on Large Power-Law Graphs.
Presentation transcript:

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 3: Recommendations & proximity Faloutsos, Miller & Tsourakakis CMU

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-2 Outline Introduction – Motivation Task 1: Node importance Task 2: Community detection Task 3: Recommendations Task 4: Connection sub-graphs Task 5: Mining graphs over time Task 6: Virus/influence propagation Task 7: Spectral graph theory Task 8: Tera/peta graph mining: hadoop Observations – patterns of real graphs Conclusions

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-3 Acknowledgement : Most of the foils in ‘Task 3’ are by Hanghang TONG

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-4 Detailed outline Problem dfn and motivation Solution: Random walk with restarts Efficient computation Case study: image auto-captioning Extensions: bi-partite graphs; tracking Conclusions

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-5 A B i i i i Motivation: Link Prediction Should we introduce Mr. A to Mr. B? ?

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-6 Motivation - recommendations customers Products / movies ‘ smith ’ Terminator 2 ??

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-7 Answer: proximity ‘yes’, if ‘A’ and ‘B’ are ‘close’ ‘yes’, if ‘smith’ and ‘terminator 2’ are ‘close’ QUESTIONS in this part: -How to measure ‘closeness’/proximity? -How to do it quickly? -What else can we do, given proximity scores?

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-8 How close is ‘A’ to ‘B’? a.k.a Relevance, Closeness, ‘Similarity’…

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-9 Why is it useful? Recommendation And many more Image captioning [Pan+] Conn. / CenterPiece subgraphs [Faloutsos+], [Tong+], [Koren+] and Link prediction [Liben-Nowell+], [Tong+] Ranking [Haveliwala], [Chakrabarti+] Management [Minkov+] Neighborhood Formulation [Sun+] Pattern matching [Tong+] Collaborative Filtering [Fouss+] …

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-10 Test Image SeaSunSkyWaveCatForestTigerGrass Image Keyword Region Automatic Image Captioning Q: How to assign keywords to the test image? A: Proximity! [Pan+ 2004]

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-11 Center-Piece Subgraph(CePS) Original Graph CePS Q: How to find hub for the black nodes? A: Proximity! [Tong+ KDD 2006] CePS guy Input Output

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-12 Detailed outline Problem dfn and motivation Solution: Random walk with restarts Efficient computation Case study: image auto-captioning Extensions: bi-partite graphs; tracking Conclusions

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-13 How close is ‘A’ to ‘B’? Should be close, if they have - many, - short - ‘heavy’ paths

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-14 Why not shortest path? A: ‘pizza delivery guy’ problem Some ``bad’’ proximities

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-15 Why not max. netflow? A: No penalty for long paths Some ``bad’’ proximities

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-16 What is a ``good’’ Proximity? Multiple Connections Quality of connection Direct & In-directed Conns Length, Degree, Weight… …

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P Random walk with restart [Haveliwala’02]

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-18 Random walk with restart Node 4 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Node 8 Node 9 Node 10 Node 11 Node Ranking vector More red, more relevant Nearby nodes, higher scores

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-19 Why RWR is a good score? all paths from i to j with length 1 all paths from i to j with length 2 all paths from i to j with length 3 : adjacency matrix. c: damping factor i j

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-20 Detailed outline Problem dfn and motivation Solution: Random walk with restarts –variants Efficient computation Case study: image auto-captioning Extensions: bi-partite graphs; tracking Conclusions

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-21 Variant: escape probability Define Random Walk (RW) on the graph Esc_Prob(CMU  Paris) –Prob (starting at CMU, reaches Paris before returning to CMU) CMU Paris the remaining graph Esc_Prob = Pr (smile before cry)

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-22 Other Variants Other measure by RWs –Community Time/Hitting Time [Fouss+] –SimRank [Jeh+] Equivalence of Random Walks –Electric Networks: EC [Doyle+]; SAEC[Faloutsos+]; CFEC[Koren+] –Spring Systems Katz [Katz], [Huang+], [Scholkopf+] Matrix-Forest-based Alg [Chobotarev+]

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-23 Other Variants Other measure by RWs –Community Time/Hitting Time [Fouss+] –SimRank [Jeh+] Equivalence of Random Walks –Electric Networks: EC [Doyle+]; SAEC[Faloutsos+]; CFEC[Koren+] –Spring Systems Katz [Katz], [Huang+], [Scholkopf+] Matrix-Forest-based Alg [Chobotarev+] All are “related to” or “similar to” random walk with restart!

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-24 Map of proximity measurements RWR Esc_Prob + Sink Hitting Time/ Commute Time Effective Conductance String System Regularized Un-constrained Quad Opt. Harmonic Func. Constrained Quad Opt. Mathematic Tools X out-degree “voltage = position” relax 4 ssp decides 1 esc_prob Katz Norma lize Physical Models

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-25 Notice: Asymmetry (even in undirected graphs) A B C D E C-> A : high A-> C: low

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-26 Summary of Proximity Definitions Goal: Summarize multiple relationships Solutions –Basic: Random Walk with Restarts [Haweliwala’02] [Pan+ 2004][Sun+ 2006][Tong+ 2006] –Properties: Asymmetry [Koren+ 2006][Tong+ 2007] [Tong+ 2008] –Variants: Esc_Prob and many others. [Faloutsos+ 2004] [Koren+ 2006][Tong+ 2007]

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-27 Detailed outline Problem dfn and motivation Solution: Random walk with restarts Efficient computation Case study: image auto-captioning Extensions: bi-partite graphs; tracking Conclusions

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-28 Preliminary: Sherman–Morrison Lemma = If: = x

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-29 Preliminary: Sherman–Morrison Lemma = If: Then: + x

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-30 Sherman – Morrison Lemma – intuition: Given a small perturbation on a matrix A -> A’ We can quickly update its inverse

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-31 SM: The block-form A D B C And many other variants… Also known as Woodbury Identity Or…

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-32 SM Lemma: Applications RLS (Recursive least squares) –and almost any algorithm in time series! Kalman filtering Incremental matrix decomposition … … and all the fast solutions we will introduce!

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-33 Reminder: PageRank With probability 1-c, fly-out to a random node Then, we have p = c B p + (1-c)/n 1 => p = (1-c)/n [I - c B] -1 1

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-34 Ranking vector Starting vector Adjacency matrix Restart p p = c B p + (1-c)/n 1 The only difference

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-35 Computing RWR n x n n x 1 Ranking vector Starting vector Adjacency matrix 1 Restart p p = c B p + (1-c)/n 1

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-36 Q: Given query i, how to solve it? ? ? Adjacency matrix Starting vector Ranking vector Query

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P OntheFly: No pre-computation/ light storage Slow on-line response O(mE)

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P PreCompute R:R: c x Q Q

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-39 PreCompute: Fast on-line response Heavy pre-computation/storage cost O(n ) 3 2

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-40 Q: How to Balance? On-line Off-line

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-41 How to balance? Idea (‘B-Lin’) Break into communities Pre-compute all, within a community Adjust (with S.M.) for ‘bridge edges’ H. Tong, C. Faloutsos, & J.Y. Pan. Fast Random Walk with Restart and Its Applications. ICDM, , 2006.

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-42 B_Lin: Basic Idea [Tong+ ICDM 2006] Find Community Fix the remaining Combine

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P ~ ~ B_Lin: details W 1 : within community ~ W ~ Cross-community

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-44 B_Lin: details W ~ I – c ~ ~ I – c – cUSV W1W1 ~ Easy to be invertedLRA difference SM Lemma!

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-45 B_Lin: summary Pre-Computational Stage Q: A: A few small, instead of ONE BIG, matrices inversions On-Line Stage Q: Efficiently recover one column of Q A: A few, instead of MANY, matrix-vector multiplications Efficiently compute and store Q

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-46 Query Time vs. Pre-Compute Time Log Query Time Log Pre-compute Time Quality: 90%+ On-line: Up to 150x speedup Pre-computation: Two orders saving

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-47 Detailed outline Problem dfn and motivation Solution: Random walk with restarts Efficient computation Case study: image auto-captioning Extensions: bi-partite graphs; tracking Conclusions

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-48 gCaP: Automatic Image Caption Q … SeaSunSkyWave {} {} CatForestGrassTiger {?, ?, ?,} A: Proximity! [Pan+ KDD2004]

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-49 Test Image SeaSunSkyWaveCatForestTigerGrass Image Keyword Region

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-50 Test Image SeaSunSkyWaveCatForestTigerGrass Image Keyword Region {Grass, Forest, Cat, Tiger}

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-51 C-DEM (Screen-shot)

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-52 C-DEM: Multi-Modal Query System for Drosophila Embryo Databases [Fan+ VLDB 2008]

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-53 Detailed outline Problem dfn and motivation Solution: Random walk with restarts Efficient computation Case study: image auto-captioning Extensions: bi-partite graphs; tracking Conclusions

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-54 RWR on Bipartite Graph n m authors Conferences Author-Conf. Matrix Observation: n >> m! Examples: 1. DBLP: 400k aus, 3.5k confs 2. NetFlix: 2.7M usrs,18k mvs

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-55 Q: Given query i, how to solve it? RWR on Skewed bipartite graphs ? ? ….... ….. … n m ArAr ….... ….. …... AcAc m confs n aus

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-56 Idea: Pre-compute the smallest, m x m matrix Use it to compute the rest proximities, on the fly H. Tong, S. Papadimitriou, P.S. Yu & C. Faloutsos. Proximity Tracking on Time-Evolving Bipartite Graphs. SDM 2008.

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-57 BB_Lin: Examples DatasetOff-Line CostOn-Line Cost DBLPa few minutesfrac. of sec. NetFlix1.5 hours<0.01 sec. 400k authors x 3.5k conf.s 2.7m user x 18k movies

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-58 Detailed outline Problem dfn and motivation Solution: Random walk with restarts Efficient computation Case study: image auto-captioning Extensions: bi-partite graphs; tracking Conclusions

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-59 Problem: update E’ edges changed Involves n’ authors, m’ confs. n authors m Conferences

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-60 Solution: Use Sherman-Morrison to quickly update the inverse matrix

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-61 Fast-Single-Update 176x speedup 40x speedup log(Time) (Seconds) Datasets Our method

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-62 pTrack: Philip S. Yu’s Top-5 conferences up to each year ICDE ICDCS SIGMETRICS PDIS VLDB CIKM ICDCS ICDE SIGMETRICS ICMCS KDD SIGMOD ICDM CIKM ICDCS ICDM KDD ICDE SDM VLDB Databases Performance Distributed Sys. Databases Data Mining DBLP: (Au. x Conf.) - 400k aus, - 3.5k confs - 20 yrs

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-63 pTrack: Philip S. Yu’s Top-5 conferences up to each year ICDE ICDCS SIGMETRICS PDIS VLDB CIKM ICDCS ICDE SIGMETRICS ICMCS KDD SIGMOD ICDM CIKM ICDCS ICDM KDD ICDE SDM VLDB Databases Performance Distributed Sys. Databases Data Mining DBLP: (Au. x Conf.) - 400k aus, - 3.5k confs - 20 yrs

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-64 KDD’s Rank wrt. VLDB over years Prox. Rank Year Data Mining and Databases are getting closer & closer

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-65 cTrack:10 most influential authors in NIPS community up to each year Author-paper bipartite graph from NIPS k papers, 2037 authors, spreading over 13 years T. Sejnowski M. Jordan

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-66 Conclusions - Take-home messages Proximity Definitions –RWR –and a lot of variants Computation –Sherman–Morrison Lemma –Fast Incremental Computation Applications –Recommendations; auto-captioning; tracking –Center-piece Subgraphs (next) – management; anomaly detection, …

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-67 References L. Page, S. Brin, R. Motwani, & T. Winograd. (1998), The PageRank Citation Ranking: Bringing Order to the Web, Technical report, Stanford Library. T.H. Haveliwala. (2002) Topic-Sensitive PageRank. In WWW, , 2002 J.Y. Pan, H.J. Yang, C. Faloutsos & P. Duygulu. (2004) Automatic multimedia cross-modal correlation discovery. In KDD, , 2004.

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-68 References C. Faloutsos, K. S. McCurley & A. Tomkins. (2002) Fast discovery of connection subgraphs. In KDD, , J. Sun, H. Qu, D. Chakrabarti & C. Faloutsos. (2005) Neighborhood Formation and Anomaly Detection in Bipartite Graphs. In ICDM, , W. Cohen. (2007) Graph Walks and Graphical Models. Draft.

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-69 References P. Doyle & J. Snell. (1984) Random walks and electric networks, volume 22. Mathematical Association America, New York. Y. Koren, S. C. North, and C. Volinsky. (2006) Measuring and extracting proximity in networks. In KDD, 245–255, A. Agarwal, S. Chakrabarti & S. Aggarwal. (2006) Learning to rank networked entities. In KDD, 14-23, 2006.

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-70 References S. Chakrabarti. (2007) Dynamic personalized pagerank in entity-relation graphs. In WWW, , F. Fouss, A. Pirotte, J.-M. Renders, & M. Saerens. (2007) Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation. IEEE Trans. Knowl. Data Eng. 19(3),

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-71 References H. Tong & C. Faloutsos. (2006) Center-piece subgraphs: problem definition and fast solutions. In KDD, , H. Tong, C. Faloutsos, & J.Y. Pan. (2006) Fast Random Walk with Restart and Its Applications. In ICDM, , H. Tong, Y. Koren, & C. Faloutsos. (2007) Fast direction-aware proximity for graph mining. In KDD, , 2007.

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-72 References H. Tong, B. Gallagher, C. Faloutsos, & T. Eliassi- Rad. (2007) Fast best-effort pattern matching in large attributed graphs. In KDD, , H. Tong, S. Papadimitriou, P.S. Yu & C. Faloutsos. (2008) Proximity Tracking on Time-Evolving Bipartite Graphs. SDM 2008.

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-73 References B. Gallagher, H. Tong, T. Eliassi-Rad, C. Faloutsos. Using Ghost Edges for Classification in Sparsely Labeled Networks. KDD 2008 H. Tong, Y. Sakurai, T. Eliassi-Rad, and C. Faloutsos. Fast Mining of Complex Time-Stamped Events CIKM 08 H. Tong, H. Qu, and H. Jamjoom. Measuring Proximity on Graphs with Side Information. ICDM 2008

CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-74 Resources For software, papers, and ppt of presentations ial.htmlwww.cs.cmu.edu/~htong/tut/cikm2008/cikm_tutor ial.html For the CIKM’08 tutorial on graphs and proximity Again, thanks to Hanghang TONG for permission to use his foils in this part