Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hongbo Deng, Michael R. Lyu and Irwin King

Similar presentations


Presentation on theme: "Hongbo Deng, Michael R. Lyu and Irwin King"— Presentation transcript:

1 A Generalized Co-HITS Algorithm and Its Application to Bipartite Graphs
Hongbo Deng, Michael R. Lyu and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong July 1st, 2009

2 Incorporate Content with Graph
Introduction Many data can be modeled as bipartite graphs IR Models for Link Analysis for Content Graph - VSM - Language Model - etc. - HITS - PageRank - etc. Relevance Semantic relations Incorporate Content with Graph - Personalized PageRank (PPR) - Linear Combination - etc.

3 An Illustration HITS PPR More reasonable Noisy link data
Query suggestion for query “map”: Noisy link data Lack of relevance constraints HITS PPR More reasonable mapquest united states map map of florida us map world map google mapquest google.com map quest weather mapquest map quest google united states map mapquest.com

4 Outline Introduction Generalized Co-HITS Experiments Conclusion
Preliminaries Iterative Framework Regularization Framework Experiments Conclusion

5 Preliminaries Content Graph X Y Explicit links: Hidden links:

6 Generalized Co-HITS Basic idea
Incorporate the bipartite graph with the content information from both sides Initialize the vertices with the relevance scores x0, y0 Propagate the scores (mutual reinforcement) Initial scores Score propagation

7 Generalized Co-HITS Iterative framework

8 Iterative  Regularization Framework
Consider the vertices on one side Smoothness Fit initial scores

9 Generalized Co-HITS Regularization Framework R1 R3 R2 Wuu Wvv
Intuition: the highly connected vertices are most likely to have similar relevance scores.

10 Generalized Co-HITS Regularization Framework The cost function:
Optimization problem: Solution:

11 Application to Query-URL Bipartite Graphs
Bipartite graph construction Edge weighted by the click frequency Normalize to obtain the transition matrix Overall Algorithm In this paper, we apply the proposed method to the query-URL bipartite graph for query suggestion. Please refer to our paper for more details.

12 Outline Introduction Preliminaries Generalized Co-HITS Experiments
Iterative Framework Regularization Framework Experiments Conclusion

13 Experimental Evaluation
Data collection AOL query log data Cleaning the data Removing the queries that appear less than 2 times Combining the near-duplicated queries 883,913 queries and 967,174 URLs 4,900,387 edges 250,127 unique terms

14 Evaluation: ODP Similarity
A simple measure of similarity among queries using ODP categories (query  category) Definition: Example: Q1: “United States”  “Regional > North America > United States” Q2: “National Parks”  “Regional > North America > United States > Travel and Tourism > National Parks and Monuments” Precision at rank n 300 distinct queries 3/5

15 Experimental Results Comparison of Iterative Framework Result 1:
personalized PageRank one-step propagation general Co-HITS Result 1: The improvements of OSP and CoIter over the baseline (the dashed line) are promising when compared to the PPR. The initial relevance scores from both sides provide valuable information.

16 Experimental Results Comparison of Regularization Framework Result 2:
single-sided regularization double-sided regularization Result 2: SiRegu can improve the performance over the baseline. CoRegu performs better than SiRegu, which owes to the newly developed cost function R3. Moreover, CoRegu is relatively robust.

17 Experimental Results Detailed Results Result 3:
The CoRegu-0.5 achieves the best performance. It is very essential and promising to consider the double-sided regularization framework for the bipartite graph.

18 Conclusions Propose the Co-HITS algorithm to incorporate the bipartite graph with the content information from both sides. The Co-HITS algorithm is more general, which includes HITS and personalized PageRank as special cases. The CoRegu is more robust with the newly developed cost function, which achieves the best performance with consistent and promising improvements.

19 Q&A Thanks!


Download ppt "Hongbo Deng, Michael R. Lyu and Irwin King"

Similar presentations


Ads by Google