Measure Proximity on Graphs with Side Information Joint Work by Hanghang Tong, Huiming Qu, Hani Jamjoom Speaker: Mary McGlohon 1 ICDM 2008, Pisa, Italy15-19.

Slides:



Advertisements
Similar presentations
BiG-Align: Fast Bipartite Graph Alignment
Advertisements

Fast Algorithms for Querying and Mining Large Graphs Hanghang Tong Machine Learning Department Carnegie Mellon University
Multi-label Relational Neighbor Classification using Social Context Features Xi Wang and Gita Sukthankar Department of EECS University of Central Florida.
Dong Liu Xian-Sheng Hua Linjun Yang Meng Weng Hong-Jian Zhang.
COMP423 Intelligent Agents. Recommender systems Two approaches – Collaborative Filtering Based on feedback from other users who have rated a similar set.
2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, Introduction to IR Research ChengXiang Zhai Department of Computer.
1 Social Influence Analysis in Large-scale Networks Jie Tang 1, Jimeng Sun 2, Chi Wang 1, and Zi Yang 1 1 Dept. of Computer Science and Technology Tsinghua.
Fast Algorithms for Querying and Mining Large Graphs Hanghang Tong Machine Learning Department Carnegie Mellon University
N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.
Graph Based Semi- Supervised Learning Fei Wang Department of Statistical Science Cornell University.
SCS CMU Joint Work by Hanghang Tong, Spiros Papadimitriou, Jimeng Sun, Philip S. Yu, Christos Faloutsos Speaker: Hanghang Tong Aug , 2008, Las Vegas.
Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun.
SCS CMU Proximity Tracking on Time- Evolving Bipartite Graphs Speaker: Hanghang Tong Joint Work with Spiros Papadimitriou, Philip S. Yu, Christos Faloutsos.
MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.
Query Biased Snippet Generation in XML Search Yi Chen Yu Huang, Ziyang Liu, Yi Chen Arizona State University.
Fast Random Walk with Restart and Its Applications
SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct , 2008, Napa, CA CIKM 2008.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P3-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 3: Recommendations & proximity Faloutsos,
Overview of Search Engines
Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings Baoning Wu and Brian D. Davison Lehigh University Symposium on Applied.
1 A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search 1 Jie Tang, 2 Ruoming Jin, and 1 Jing Zhang 1 Knowledge.
School of Electronics Engineering and Computer Science Peking University Beijing, P.R. China Ziqi Wang, Yuwei Tan, Ming Zhang.
GDG DevFest Central Italy Joint work with J. Feldman, S. Lattanzi, V. Mirrokni (Google Research), S. Leonardi (Sapienza U. Rome), H. Lynch (Google)
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
2015/10/111 DBconnect: Mining Research Community on DBLP Data Osmar R. Zaïane, Jiyang Chen, Randy Goebel Web Mining and Social Network Analysis Workshop.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.
Discovering Meta-Paths in Large Heterogeneous Information Network
ValuePick : Towards a Value-Oriented Dual-Goal Recommender System Leman Akoglu Christos Faloutsos OEDM in conjunction with ICDM 2010 Sydney, Australia.
KDD 2007, San Jose Fast Direction-Aware Proximity for Graph Mining Speaker: Hanghang Tong Joint work w/ Yehuda Koren, Christos Faloutsos.
Xiaowei Ying, Xintao Wu Univ. of North Carolina at Charlotte PAKDD-09 April 28, Bangkok, Thailand On Link Privacy in Randomizing Social Networks.
SCS CMU Proximity on Large Graphs Speaker: Hanghang Tong Guest Lecture.
Fast Random Walk with Restart and Its Applications Hanghang Tong, Christos Faloutsos and Jia-Yu (Tim) Pan ICDM 2006 Dec , HongKong.
Tools and Algorithms for Querying and Mining Large Graphs Hanghang Tong Machine Learning Department Carnegie Mellon University
1 Panther: Fast Top-K Similarity Search on Large Networks Jing Zhang 1, Jie Tang 1, Cong Ma 1, Hanghang Tong 2, Yu Jing 1, and Juanzi Li 1 1 Department.
Guided Learning for Role Discovery (GLRD) Presented by Rui Liu Gilpin, Sean, Tina Eliassi-Rad, and Ian Davidson. "Guided learning for role discovery (glrd):
Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad
KDD 2007, San Jose Fast Direction-Aware Proximity for Graph Mining Speaker: Hanghang Tong Joint work w/ Yehuda Koren, Christos Faloutsos.
Threshold Setting and Performance Monitoring for Novel Text Mining Wenyin Tang and Flora S. Tsai School of Electrical and Electronic Engineering Nanyang.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Panther: Fast Top-k Similarity Search in Large Networks JING ZHANG, JIE TANG, CONG MA, HANGHANG TONG, YU JING, AND JUANZI LI Presented by Moumita Chanda.
Compact Query Term Selection Using Topically Related Text Date : 2013/10/09 Source : SIGIR’13 Authors : K. Tamsin Maxwell, W. Bruce Croft Advisor : Dr.Jia-ling,
Kijung Shin Jinhong Jung Lee Sael U Kang
CSCE 5073 Section 001: Data Mining Spring Overview Class hour 12:30 – 1:45pm, Tuesday & Thur, JBHT 239 Office hour 2:00 – 4:00pm, Tuesday & Thur,
Supervised Random Walks: Predicting and Recommending Links in Social Networks Lars Backstrom (Facebook) & Jure Leskovec (Stanford) Proc. of WSDM 2011 Present.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Center-Piece Subgraphs: Problem definition and Fast Solutions Hanghang Tong Christos Faloutsos Carnegie Mellon University.
SCS CMU Speaker Hanghang Tong Colibri: Fast Mining of Large Static and Dynamic Graphs Speaking Skill Requirement.
Presented by: Siddhant Kulkarni Spring Authors: Publication:  ICDE 2015 Type:  Research Paper 2.
Term Project Proposal By J. H. Wang Apr. 7, 2017.
Finding Dense and Connected Subgraphs in Dual Networks
Neighborhood - based Tag Prediction
Proposal for Term Project
Introduction to IR Research
Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad
中国计算机学会学科前沿讲习班:信息检索 Course Overview
Course Summary (Lecture for CS410 Intro Text Info Systems)
Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad
Distributed Representations of Subgraphs
Large Graph Mining: Power Tools and a Practitioner’s guide
Speaker: Hanghang Tong Carnegie Mellon University
Jinhong Jung, Woojung Jin, Lee Sael, U Kang, ICDM ‘16
Asymmetric Transitivity Preserving Graph Embedding
Learning to Rank Typed Graph Walks: Local and Global Approaches
CSCE 4143 Section 001: Data Mining Spring 2019.
Proximity in Graphs by Using Random Walks
PRSim: Sublinear Time SimRank Computation on Large Power-Law Graphs.
Presentation transcript:

Measure Proximity on Graphs with Side Information Joint Work by Hanghang Tong, Huiming Qu, Hani Jamjoom Speaker: Mary McGlohon 1 ICDM 2008, Pisa, Italy15-19 December, 2009

Cyano: Process Collaboration Wiki 2 Q: How to enable social recommendation in Cyano? Q: How to enable social recommendation?

Scoop: current recommendation system [Qu+ SCC 2008] Given a node in a graph (e.g., given a user node in a user-to-process graph), Find – 1. [Ranking List] a list of recommended nodes, which are most related to the query node – 2. [Connection Subgraph] a connection subgraph, which can best interpret the relationship between the query node and the recommended node(s) 3 Proximity is the core of scoop! What to recommend Why to recommend

Challenges in Scoop How to incorporate users’ feedback (like/dislike)? 4 How to automatically adjust the ranking for the query node 1? Current subgraph between 1 and 10 How to modify our subgraph to weaken the links between 1 and 10 that involve node 5? Q: How to incorporate such side information in measuring node proximity on graphs? Feedback on ranking listFeedback on conn-graph

Isomorphic Settings of Scoop Proximity is the Main Tool for – Neighborhood search – Anomaly detection – Pattern matching – Image captioning – … Source of Side Information is Rich – Ratings in recommendation system – Opinion/sentiment in blog analysis – Clickthrough data – … 5

Roadmap Motivations Proximity w/o Side Information Proximity w/ Side Information – ProSIN: Method – Fast-ProSIN: Fast Solution Experimental Results Conclusion 6

Proximity on Graph: What? a.k.a Relevance, Closeness, ‘Similarity’… 7

What is a ``good’’ Proximity? Multiple Connections Quality of connection Direct & In-direct conns Length, Degree, Weight… … 8

Sol: Random walk with restart [Pan+ KDD 2004] Node 4 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Node 8 Node 9 Node 10 Node 11 Node Ranking vector More red, more relevant Nearby nodes, higher scores 9

Why is RWR a good score? all paths from i to j with length 1 all paths from i to j with length 2 all paths from i to j with length 3 : adjacency matrix. c: damping factor i j 10

Proximity in Current Scoop 11 U1 U2 U3 U4 P1 P2 P3 P4 P5 User Process Initial result: P2 P3 P Ranking ListConn-Subgraph

Roadmap Motivations Proximity w/o Side Information Proximity w/ Side Information – ProSIN: Method – Fast-ProSIN: Fast Solution Experimental Results Conclusion 12

ProSIN: Challenges 13 Query We want to Boost the neighbor of 4 Penalize the neighbor of 6

ProSIN: How to Use Side Information to refine the graph! 14 Query

ProSIN: Detailed Algorithm Input: – A weighted directed graph A – Source node s and target t – Side information: positive net P and the negative set N Output: – Proximity score from the source to target Method: 1.Add a link from the source node to each of the positive nodes x 2.Introduce the sink node into the graph 3.For each of the negative nodes y,  find its neighboring nodes  Add a link from node y to the sink  Add a link from each neighboring node of node y to the sink 4.Perform random walk with restart for the source node s on the refined graph 5.Output the proximity score as the steady state probability that the random particle will finally stay at the target node t 15 Skip

Process management 16 Given a user-process graph, `U2’ is the query, Which are the top 3 most related processes? Initial result (no feedback): P2 P3 P1 Updated result (`no’ to `P2’) : P3 P4 P5 U1 U2 U3 U4 P1 P2 P3 P4 P5 User Process

Roadmap Motivations Proximity w/o Side Information Proximity w/ Side Information – ProSIN: Method – Fast-ProSIN: Fast Solution Experimental Results Conclusion 17

Computing RWR n x n n x 1 Ranking vector Starting vector Adjacency matrix 1 Restart p 18

Q: Given query i, how to solve it? ? ? Adjacency matrix Starting vector Ranking vector Query 19

OntheFly: ??

OntheFly: No pre-computation / light storage Slow on-line response O(mE)

NB_Lin [Tong+ ICDM06] Pre-Compute Stage – Step 1: – Step 2: On-Line Stage – 2 matrix-vector multiplications C1 C2 C3 Fast response if … The desired graph is un-known W ~ ~ ~ U S V X X

How to rescue: Fast-ProSIN 23 Before After A lot of Overlap! - Pre-Compute on original graph - Update in on-line stage

Roadmap Motivations Proximity wo/ Side Information Proximity w/ Side Information – ProSIN: Method – Fast-ProSIN: Fast Solution Experimental Results Conclusion 24

Experimental Setup Data Sets – DBLP-AC Author-Conference bipartite graph; 400K authors; 3.5K conferences; 1M edges – DBLP-ML Co-authorship graph from ICML and NIPS; 4.5K nodes, 20K edges – Coral Image-Region-Keyword graph, 52K nodes, 350K edges We want to check – The effectiveness of ProSIN – The efficiency of Fast-ProSIN 25

Initial ResultsNo to `ICML’Yes to `SIGIR’ 'ICDM' 'ICML' 'SDM' 'VLDB' 'ICDE' 'SIGMOD' 'NIPS' 'PKDD' 'IJCAI' 'PAKDD' 'ICDM' 'SDM' 'PKDD' 'ICDE' 'VLDB' 'SIGMOD' 'PAKDD' 'CIKM' 'SIGIR' 'WWW' 'SIGIR' 'TREC' 'CIKM' 'ECIR' 'CLEF' 'ICDM' 'JCDL' 'VLDB' 'ACL' 'ICDE' two main sub-communities in KDD: DBs (green) vs. Stat (Red) Negative feedback on ICML will exclude other stats confs (NIPS, IJCAI) Positive feedback on SIGIR will bring more IR (brown) conferences. What are most related conferences wrt KDD? (DBLP author-conference bipartite graph) 26 Interactive Neighborhood Search

Initial ResultsNo to `ICML’Yes to `SIGIR’ 'ICDM' 'ICML' 'SDM' 'VLDB' 'ICDE' 'SIGMOD' 'NIPS' 'PKDD' 'IJCAI' 'PAKDD' 'ICDM' 'SDM' 'PKDD' 'ICDE' 'VLDB' 'SIGMOD' 'PAKDD' 'CIKM' 'SIGIR' 'WWW' 'SIGIR' 'TREC' 'CIKM' 'ECIR' 'CLEF' 'ICDM' 'JCDL' 'VLDB' 'ACL' 'ICDE' two main sub-communities in KDD: DBs (green) vs. Stat (Red) Negative feedback on ICML will exclude other stats confs (NIPS, IJCAI) Positive feedback on SIGIR will bring more IR (brown) conferences. What are most related conferences wrt KDD? (DBLP author-conference bipartite graph) 27 Interactive Neighborhood Search

Initial ResultsNo to `ICML’Yes to `SIGIR’ 'ICDM' 'ICML' 'SDM' 'VLDB' 'ICDE' 'SIGMOD' 'NIPS' 'PKDD' 'IJCAI' 'PAKDD' 'ICDM' 'SDM' 'PKDD' 'ICDE' 'VLDB' 'SIGMOD' 'PAKDD' 'CIKM' 'SIGIR' 'WWW' 'SIGIR' 'TREC' 'CIKM' 'ECIR' 'CLEF' 'ICDM' 'JCDL' 'VLDB' 'ACL' 'ICDE' two main sub-communities in KDD: DBs (green) vs. Stat (Red) Negative feedback on ICML will exclude other stats confs (NIPS, IJCAI) Positive feedback on SIGIR will bring more IR (brown) conferences. what are most related conferences wrt KDD? (DBLP author-conference bipartite graph) 28 Interactive Neighborhood Search

Andrew McCallum Yiming Yang Tom M. Mitchell Seán Slattery Rayid Ghani Xuerui Wang Rebecca Hutchinson Jian Zhang Zoubin Ghahramani John D. Laffterty Text Mining Information Retrieval Statistics Connection Subgraph: Initial Result (between “Andrew Mccallum” and “Yiming Yang”) There are two main connections between “McCallum” and “Yang” 29

Andrew McCallum Yiming Yang Michael I. Jordan Xiaojin Zhu Rong Jin Andrew Ng Jian Zhang Zoubin Ghahramani John D. Laffterty Fernando C.N. Pereira Connection Subgraph: After Feedback (between “Andrew Mccallum” and “Yiming Yang”, but avoid “Tom M. Mitchell”) The feedback guides to avoid the entire ‘Text’ connection, and brings more connections on ‘Statistics’ 30

Test Image SeaSunSkyWaveCatForestTigerGrass Image Keyword Region Automatic Image Caption Q: How to assign keywords to the test image? 31

Semi-automatic image caption (precision) 32 Our method Baseline Linear Combination Remove Negative Nodes 5 keywords that are most relevant to the test image are returned for users’ yes/no confirmation Predict Length

Semi-automatic image caption (recall) 33 Our method Baseline Linear Combination Remove Negative Nodes Predict Length

Fast-ProSIN: Quality-Speed Trade-off 34 PrecisionRecallTime 93.0%+ quality preserving Up to 49x speed-up

Conclusion Goal: Incorporate Users’ Feedback (Like/Dis-like) in Proximity Measurement on Graphs Q: How to customize Tom‘s applications? A: ProSIN – Basic Idea: Bias Random Walk – Wide Applicability, Easy to Use Q: How to reflect Tom’s real-time interest? A: Fast-ProSIN – Basic Idea: Explore smoothness – Significant speedup (minutes to seconds) 35

Q & A Thank you! 36