Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.

Similar presentations


Presentation on theme: "1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007."— Presentation transcript:

1 1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007

2 2 Introduction (1/2) A search engine can track which of its search results were clicked for which query A search engine can track which of its search results were clicked for which query Click records of query-document pairs can be viewed as a weak indication of relevance Click records of query-document pairs can be viewed as a weak indication of relevance –The user decided to at least view the document, based on its description in the search results We can use the clicks of past users to improve the current search results We can use the clicks of past users to improve the current search results –The clicked set of documents is likely to differ from the current user ’ s relevance set

3 3 Introduction (2/2) From the perspective of a user conducting a search: From the perspective of a user conducting a search: –Documents that are clicked but not relevant constitute noise –Documents that are relevant but not clicked constitute sparsity in the click data Power law distribution: most queries in the click log have a small number of clicked documents Power law distribution: most queries in the click log have a small number of clicked documents This paper focuses on the sparsity problem by giving a Markov random walk model, although the model also has noise reduction properties This paper focuses on the sparsity problem by giving a Markov random walk model, although the model also has noise reduction properties

4 4 Algorithm on the Click Graph The current model uses click data alone, without considering document content or query content The current model uses click data alone, without considering document content or query content The click graph: The click graph: –Bipartite –Two types of nodes: queries and documents –An edge connects a query and a document if a click for that query-document pair is observed –The edge may be weighted according to the total number of clicks from all users

5 5 Click Graph Example

6 6 Application Areas for Algorithms on Click Graph Query-to-document ‘search’ Query-to-document ‘search’ –Given a query, find relevant documents, as in ad hoc search Query-to-query ‘suggestion’ Query-to-query ‘suggestion’ –Given a query, find other queries that the user might like to run Document-to-query ‘annotation’ Document-to-query ‘annotation’ –Given a document, attach related queries to it Document-to-document ‘relevance feedback’ Document-to-document ‘relevance feedback’ –Given an example document that is relevant to the user, find additional relevant documents

7 7 Random Walk Model A basic query formulation model A basic query formulation model 1.Imagine a document (information need) 2.Think of a query associated with the document 3.Issue the query or imagine another document related to the query 4.Iterative thought process (noise process) – A Markov random walk which describes a probability distribution over queries The retrieval model is obtained by inverting the query formulation model The retrieval model is obtained by inverting the query formulation model –Starts from an observed query, and attempts to undo the noise, inferring the underlying information need –Backward walks

8 8 Random Walk Computation C jk : click counts associating node j to k C jk : click counts associating node j to k Define transition probabilities P t+1|t (k|j) from j to k Define transition probabilities P t+1|t (k|j) from j to k s is the self-transition probability, which corresponds to the user favoring the current query or document Transition matrix [A] jk = P t+1|t (k|j)  P t|0 (k|j)=[A t ] jk Transition matrix [A] jk = P t+1|t (k|j)  P t|0 (k|j)=[A t ] jk –A measure of the volume of paths between j and k

9 9 Random Walk Model for Retrieval Backward random walk for retrieval: Backward random walk for retrieval: Given that we ended a t -step walk at node j, we find the probability of starting at node k, P 0|t (k|j) Bayes rule: P 0|t (k|j) = P t|0 (j|k)P 0 (k) ╱ P t (j), assuming P 0 (k)=1/N and P t (j) = Σ i [A t ] ij  P 0|t (k|j) = [A t Z -1 ] kj where Z is diagonal and Z jj = Σ i [A t ] ij Forward random walk: Forward random walk: P t|0 (k|j) = [v j . A t ] k P t|0 (k|j) = [v j . A t ] k

10 10 Forward vs. Backward Walks PageRank: a query-independent forward random walk on the link graph, which proceeds to its stationary distribution PageRank: a query-independent forward random walk on the link graph, which proceeds to its stationary distribution In statistics, the backward walk model is referred to as diagnostic, and in contrast, the forward walk model is predictive In statistics, the backward walk model is referred to as diagnostic, and in contrast, the forward walk model is predictive When t → ∞: When t → ∞: –The forward random walk approaches the stationary distribution Gives high probability to nodes with large number of clicks Gives high probability to nodes with large number of clicks –The backward random walk approaches the prior starting distribution, which we have taken to be uniform

11 11 Clustering Effect Given an end node that is part of a cluster, we have similar probabilities of having started the walk from any node in the cluster Given an end node that is part of a cluster, we have similar probabilities of having started the walk from any node in the cluster

12 12 Walk Parameters Figure: Probability distribution of non-self transitions under different combinations of t and s

13 13 Experiment Data A 14-day click log of web image search engines A 14-day click log of web image search engines –Judged images with distance 1 from the query had precision of 75% –Pruning: remove URLs only connected to one query and remove queries that only connected to one URL –After pruning: 505,000 URLs, 202,000 queries and 1.1 million edges –Uniformly sampling 45 queries for evaluation –TREC-style pooling relevance judgments of depth 20 2278 relevance judgments identify 818 relevant images 2278 relevance judgments identify 818 relevant images

14 14 Experiment Result-1 Table 1. The furthest node from any of our test queries is at distance 41 ( ‘ 101-0.9-backward ’ ). ‘ dist ’ and ‘ 1-0-forward ’ are the baselines.

15 15 Experiment Result-2 Figure: The number of images retrieved at different distances from the query for each method. The 101-step walk with zero- self-transition possibly goes too far, returning too few distance-1 images.

16 16 Experiment Result-3 Figure: The precision at different distances from the query for each method.

17 17 Experiment Result-4 Figure: Precision-recall curves of forward and backward walks, with zero self-transition probability (1000 URLs retrieved)

18 18 Experiment Result-5 Figure: Parameter sensitivity for a backwards walk. Each contour shows a 0.01 variation in MAP@20. Grid intersections indicate the parameter combinations tried. The large plateau has the highest MAP@20 (0.56-0.57)

19 19 Conclusion We have applied a Markov random walk model to the click graph, giving us a high-quality ranking of documents for a given query, including those as-yet unclicked for that query We have applied a Markov random walk model to the click graph, giving us a high-quality ranking of documents for a given query, including those as-yet unclicked for that query A backward walk was more effective than a forward walk, which supports the notion underlying our backward walk A backward walk was more effective than a forward walk, which supports the notion underlying our backward walk We got the best results from a walk of 11 steps, or 101 steps with high self-transition probability We got the best results from a walk of 11 steps, or 101 steps with high self-transition probability We have studied ad hoc retrieval in this paper and the model could be effective and easily applied in the applications listed We have studied ad hoc retrieval in this paper and the model could be effective and easily applied in the applications listed Given our model, another possible step would be to incorporate document content and query content, by incorporating a language model, aiming to find document that are not yet part of the click graph Given our model, another possible step would be to incorporate document content and query content, by incorporating a language model, aiming to find document that are not yet part of the click graph


Download ppt "1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007."

Similar presentations


Ads by Google