Presentation is loading. Please wait.

Presentation is loading. Please wait.

Measure Proximity on Graphs with Side Information Joint Work by Hanghang Tong, Huiming Qu, Hani Jamjoom Speaker: Mary McGlohon 1 ICDM 2008, Pisa, Italy15-19.

Similar presentations


Presentation on theme: "Measure Proximity on Graphs with Side Information Joint Work by Hanghang Tong, Huiming Qu, Hani Jamjoom Speaker: Mary McGlohon 1 ICDM 2008, Pisa, Italy15-19."— Presentation transcript:

1 Measure Proximity on Graphs with Side Information Joint Work by Hanghang Tong, Huiming Qu, Hani Jamjoom Speaker: Mary McGlohon 1 ICDM 2008, Pisa, Italy15-19 December, 2009

2 Cyano: Process Collaboration Wiki 2 Q: How to enable social recommendation in Cyano? Q: How to enable social recommendation?

3 Scoop: current recommendation system [Qu+ SCC 2008] Given a node in a graph (e.g., given a user node in a user-to-process graph), Find – 1. [Ranking List] a list of recommended nodes, which are most related to the query node – 2. [Connection Subgraph] a connection subgraph, which can best interpret the relationship between the query node and the recommended node(s) 3 Proximity is the core of scoop! What to recommend Why to recommend

4 Challenges in Scoop How to incorporate users’ feedback (like/dislike)? 4 How to automatically adjust the ranking for the query node 1? 1 4 2 5 3 10 Current subgraph between 1 and 10 How to modify our subgraph to weaken the links between 1 and 10 that involve node 5? Q: How to incorporate such side information in measuring node proximity on graphs? Feedback on ranking listFeedback on conn-graph

5 Isomorphic Settings of Scoop Proximity is the Main Tool for – Neighborhood search – Anomaly detection – Pattern matching – Image captioning – … Source of Side Information is Rich – Ratings in recommendation system – Opinion/sentiment in blog analysis – Clickthrough data – … 5

6 Roadmap Motivations Proximity w/o Side Information Proximity w/ Side Information – ProSIN: Method – Fast-ProSIN: Fast Solution Experimental Results Conclusion 6

7 Proximity on Graph: What? a.k.a Relevance, Closeness, ‘Similarity’… 7

8 What is a ``good’’ Proximity? Multiple Connections Quality of connection Direct & In-direct conns Length, Degree, Weight… … 8

9 Sol: Random walk with restart [Pan+ KDD 2004] Node 4 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Node 8 Node 9 Node 10 Node 11 Node 12 0.13 0.10 0.13 0.22 0.13 0.05 0.08 0.04 0.03 0.04 0.02 1 4 3 2 5 6 7 9 10 8 1 1212 0.13 0.10 0.13 0.05 0.08 0.04 0.02 0.04 0.03 Ranking vector More red, more relevant Nearby nodes, higher scores 9

10 Why is RWR a good score? all paths from i to j with length 1 all paths from i to j with length 2 all paths from i to j with length 3 : adjacency matrix. c: damping factor i j 10

11 Proximity in Current Scoop 11 U1 U2 U3 U4 P1 P2 P3 P4 P5 User Process Initial result: P2 P3 P1 1 4 2 5 3 6 8 7 9 10 1 4 2 5 3 Ranking ListConn-Subgraph

12 Roadmap Motivations Proximity w/o Side Information Proximity w/ Side Information – ProSIN: Method – Fast-ProSIN: Fast Solution Experimental Results Conclusion 12

13 ProSIN: Challenges 13 Query We want to Boost the neighbor of 4 Penalize the neighbor of 6

14 ProSIN: How to Use Side Information to refine the graph! 14 Query

15 ProSIN: Detailed Algorithm Input: – A weighted directed graph A – Source node s and target t – Side information: positive net P and the negative set N Output: – Proximity score from the source to target Method: 1.Add a link from the source node to each of the positive nodes x 2.Introduce the sink node into the graph 3.For each of the negative nodes y,  find its neighboring nodes  Add a link from node y to the sink  Add a link from each neighboring node of node y to the sink 4.Perform random walk with restart for the source node s on the refined graph 5.Output the proximity score as the steady state probability that the random particle will finally stay at the target node t 15 Skip

16 Process management 16 Given a user-process graph, `U2’ is the query, Which are the top 3 most related processes? Initial result (no feedback): P2 P3 P1 Updated result (`no’ to `P2’) : P3 P4 P5 U1 U2 U3 U4 P1 P2 P3 P4 P5 User Process

17 Roadmap Motivations Proximity w/o Side Information Proximity w/ Side Information – ProSIN: Method – Fast-ProSIN: Fast Solution Experimental Results Conclusion 17

18 Computing RWR 1 4 3 2 5 6 7 9 10 8 1 1212 n x n n x 1 Ranking vector Starting vector Adjacency matrix 1 Restart p 18

19 Q: Given query i, how to solve it? ? ? Adjacency matrix Starting vector Ranking vector Query 19

20 OntheFly: 1 4 3 2 5 6 7 9 10 8 1 1212 20 ??

21 OntheFly: 1 4 3 2 5 6 7 9 10 8 1 1212 No pre-computation / light storage Slow on-line response O(mE) 21 1 4 3 2 5 6 7 9 10 8 1 1212 0.13 0.10 0.13 0.05 0.08 0.04 0.02 0.04 0.03

22 NB_Lin [Tong+ ICDM06] Pre-Compute Stage – Step 1: – Step 2: On-Line Stage – 2 matrix-vector multiplications 22 1 4 3 2 5 6 7 9 10 8 1 1212 4 1 2 3 5 6 7 8 9 11 12 C1 C2 C3 Fast response if … The desired graph is un-known W ~ ~ ~ U S V X X

23 How to rescue: Fast-ProSIN 23 Before After A lot of Overlap! - Pre-Compute on original graph - Update in on-line stage

24 Roadmap Motivations Proximity wo/ Side Information Proximity w/ Side Information – ProSIN: Method – Fast-ProSIN: Fast Solution Experimental Results Conclusion 24

25 Experimental Setup Data Sets – DBLP-AC Author-Conference bipartite graph; 400K authors; 3.5K conferences; 1M edges – DBLP-ML Co-authorship graph from ICML and NIPS; 4.5K nodes, 20K edges – Coral Image-Region-Keyword graph, 52K nodes, 350K edges We want to check – The effectiveness of ProSIN – The efficiency of Fast-ProSIN 25

26 Initial ResultsNo to `ICML’Yes to `SIGIR’ 'ICDM' 'ICML' 'SDM' 'VLDB' 'ICDE' 'SIGMOD' 'NIPS' 'PKDD' 'IJCAI' 'PAKDD' 'ICDM' 'SDM' 'PKDD' 'ICDE' 'VLDB' 'SIGMOD' 'PAKDD' 'CIKM' 'SIGIR' 'WWW' 'SIGIR' 'TREC' 'CIKM' 'ECIR' 'CLEF' 'ICDM' 'JCDL' 'VLDB' 'ACL' 'ICDE' two main sub-communities in KDD: DBs (green) vs. Stat (Red) Negative feedback on ICML will exclude other stats confs (NIPS, IJCAI) Positive feedback on SIGIR will bring more IR (brown) conferences. What are most related conferences wrt KDD? (DBLP author-conference bipartite graph) 26 Interactive Neighborhood Search

27 Initial ResultsNo to `ICML’Yes to `SIGIR’ 'ICDM' 'ICML' 'SDM' 'VLDB' 'ICDE' 'SIGMOD' 'NIPS' 'PKDD' 'IJCAI' 'PAKDD' 'ICDM' 'SDM' 'PKDD' 'ICDE' 'VLDB' 'SIGMOD' 'PAKDD' 'CIKM' 'SIGIR' 'WWW' 'SIGIR' 'TREC' 'CIKM' 'ECIR' 'CLEF' 'ICDM' 'JCDL' 'VLDB' 'ACL' 'ICDE' two main sub-communities in KDD: DBs (green) vs. Stat (Red) Negative feedback on ICML will exclude other stats confs (NIPS, IJCAI) Positive feedback on SIGIR will bring more IR (brown) conferences. What are most related conferences wrt KDD? (DBLP author-conference bipartite graph) 27 Interactive Neighborhood Search

28 Initial ResultsNo to `ICML’Yes to `SIGIR’ 'ICDM' 'ICML' 'SDM' 'VLDB' 'ICDE' 'SIGMOD' 'NIPS' 'PKDD' 'IJCAI' 'PAKDD' 'ICDM' 'SDM' 'PKDD' 'ICDE' 'VLDB' 'SIGMOD' 'PAKDD' 'CIKM' 'SIGIR' 'WWW' 'SIGIR' 'TREC' 'CIKM' 'ECIR' 'CLEF' 'ICDM' 'JCDL' 'VLDB' 'ACL' 'ICDE' two main sub-communities in KDD: DBs (green) vs. Stat (Red) Negative feedback on ICML will exclude other stats confs (NIPS, IJCAI) Positive feedback on SIGIR will bring more IR (brown) conferences. what are most related conferences wrt KDD? (DBLP author-conference bipartite graph) 28 Interactive Neighborhood Search

29 Andrew McCallum Yiming Yang Tom M. Mitchell Seán Slattery Rayid Ghani Xuerui Wang Rebecca Hutchinson Jian Zhang Zoubin Ghahramani John D. Laffterty 2 1 2 2 4 1 1 1 1 1 1 1 2 1 Text Mining Information Retrieval Statistics Connection Subgraph: Initial Result (between “Andrew Mccallum” and “Yiming Yang”) There are two main connections between “McCallum” and “Yang” 29

30 Andrew McCallum Yiming Yang Michael I. Jordan Xiaojin Zhu Rong Jin Andrew Ng Jian Zhang Zoubin Ghahramani John D. Laffterty 2 1 16 2 7 Fernando C.N. Pereira 2 4 2 2 1 4 2 2 3 Connection Subgraph: After Feedback (between “Andrew Mccallum” and “Yiming Yang”, but avoid “Tom M. Mitchell”) The feedback guides to avoid the entire ‘Text’ connection, and brings more connections on ‘Statistics’ 30

31 Test Image SeaSunSkyWaveCatForestTigerGrass Image Keyword Region Automatic Image Caption Q: How to assign keywords to the test image? 31

32 Semi-automatic image caption (precision) 32 Our method Baseline Linear Combination Remove Negative Nodes 5 keywords that are most relevant to the test image are returned for users’ yes/no confirmation Predict Length

33 Semi-automatic image caption (recall) 33 Our method Baseline Linear Combination Remove Negative Nodes Predict Length

34 Fast-ProSIN: Quality-Speed Trade-off 34 PrecisionRecallTime 93.0%+ quality preserving Up to 49x speed-up

35 Conclusion Goal: Incorporate Users’ Feedback (Like/Dis-like) in Proximity Measurement on Graphs Q: How to customize Tom‘s applications? A: ProSIN – Basic Idea: Bias Random Walk – Wide Applicability, Easy to Use Q: How to reflect Tom’s real-time interest? A: Fast-ProSIN – Basic Idea: Explore smoothness – Significant speedup (minutes to seconds) 35

36 Q & A Thank you! htong@cs.cmu.edu hqu@us.ibm.com jamjoom@us.ibm.com 36


Download ppt "Measure Proximity on Graphs with Side Information Joint Work by Hanghang Tong, Huiming Qu, Hani Jamjoom Speaker: Mary McGlohon 1 ICDM 2008, Pisa, Italy15-19."

Similar presentations


Ads by Google