1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti Ravi Kumar Kunal Punera

1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)

2 What are quicklinks Quicklinks Result Website

3 Quicklinks = URLs within the search result website Enable fast navigation to important parts of the website Which URLs should be QLs? Quicklinks Result Website

4 Quicklink Selection Some obvious strategies don’t work very well  Top clicked URLs in search engine URL may have low relevance in the QL context  lib.utexas.edu/maps is popular for searches on “maps” and not for searches on “Univ. of Texas” URL may be too specific:  automobiles.honda.com/civic-hybrid/exterior-photos.aspx for honda.com URL popularity be time sensitive:  nytimes.com/election-guide/2008/ for nytimes.com

5 Quicklink Selection Some obvious strategies don’t work very well Top clicked URLs in search engine  Top visited URLs in toolbar data May not relate to search activity: e.g., for nytimes.com  #3 is nytimes.com/mem/emailthis.htmlnytimes.com/mem/emailthis.html  #6 is nytimes.com/auth/loginnytimes.com/auth/login  #8 is nytimes.com/gst/regi.htmlnytimes.com/gst/regi.html

6 Quicklink Selection Some obvious strategies don’t work very well Top clicked URLs in search engine Top visited URLs in toolbar data Top URLs from analysis of hyperlink graph  Ignores preferences of search users  Toolbar data is more representative Heavily tagged URLs (e.g., del.icio.us/digg)  Low coverage: Too few websites

7 Quicklink Selection Need a combined approach  Search logs  Toolbar data  Web-server logs  Website hyperlink graph  User tags This paper

8 Related Work Sitemap generation [Perkowitz+/00] Detection of hard-to-find URLs [Srikant+/01] Improving website navigability [Doerr+/07] Mining Web usage patterns [Buchner/99, Cadez+/03] BrowseRank [Liu+/08] Post-search browsing behavior [Bilenko+/08] We focus on QLs in the context of Search

9 Outline Motivation and Related Work Problem Formulation Proposed Solution Experiments Conclusions

10 Problem Formulation Which k URLs should be QLs? “The greatest good for the greatest number” QLs save clicks Maximize the total number of clicks saved using at most k QLs  But when exactly is a click “saved”?

11 Problem Formulation When does a QL get clicked by the user? Graph of click trails (Toolbar data) Say we pick this node as a QL nasa.gov Hubble telescope Photos

12 Problem Formulation Say we pick this node as a QL Assumption: The user recognizes if SearchResult  QL  Destination Graph of click trails (Toolbar data) nasa.gov Hubble telescope Photos

13 Problem Formulation Say we pick this node as a QL (saves 1 click each) Assumption: The user recognizes if SearchResult  QL  Destination Graph of click trails (Toolbar data) nasa.gov

14 Problem Formulation Say we pick this node as a QL (saves 1 click each) (saves 2 clicks each) (saves 0) Total savings = 1*3 + 2*2 = 7 clicks Graph of click trails (Toolbar data) Assumption: The user recognizes if SearchResult  QL  Destination nasa.gov

15 Problem Formulation However…  Unknown pages might become QLs lyrics.com A BCZ … These could become the “best” QLs

16 Problem Formulation However…  Unknown pages might become QLs  Automatic-redirect pages might become QLs: nytimes.com forces logging in aaa.com forces zipcode entry We need QLs that are “noticeable” in a search context

17 Problem Formulation How can we estimate noticeability?  Via Search click-logs  Noticeability of a URL u:  User notices a useful QL with probability α(u) Tuning param (≈ 2) Fraction of search clicks for u on website

18 Problem Formulation QL1 (saves 0) QL2 # trail prob #clicks saves 2 x α 1 x 2 saves 1 x α 1 x 1 saves 2 x (1-α 2 )α 1 x 1 saves 2 x α 2 x 2 Total = 5α 1 + 4α 2 + 2(1-α 1 )α 2 Assumption: The user picks the best QL that he/she notices nasa.gov ?

19 Problem Formulation QL1 (saves 0) QL2 # trail prob #clicks saves 2 x α 1 x 2 saves 1 x α 1 x 1 saves 2 x (1-α 2 )α 1 x 1 saves 2 x α 2 x 2 Total = 5α 1 + 4α 2 + 2(1-α 1 )α 2 If only QL1 is perfectly noticeable (α 1 =1, α 2 =0): Total = 7 clicks (as if 1 QL only) If both QLs are perfectly noticeable (α 1 =1, α 2 =1): Total = 9 clicks nasa.gov

20 Problem Formulation Which k URLs should be QLs? Maximize the expected number of clicks saved using at most k QLs  while incorporating “noticeability”

22 Algorithms Maximize expected number of saved clicks using k QLs  NP-Hard Theorem: This objective is non-decreasing submodular 1. Non-negative 2. Adding QLs never hurts 3. “Diminishing Returns” u Marginal improvement to set S Marginal improvement to superset S’

23 Algorithms Greedy algorithm: Iteratively pick QLs that increase the number of saved clicks the most  Within a factor (1-1/e) of OPT [Nemhauser+/’78]

24 Algorithms However…  Inhomogeneous results: QLs for ea.com are fifa08.ea.com battlefield.ea.com 6 webpages deep inside thesim2.ea.com  Redundant results: QLs for senate.gov include obama.senate.gov obama.senate.gov/about obama.senate.gov/contact obama.senate.gov/votes Parent URL makes the child URLs redundant Two games made by EA

25 Algorithms Both can be specified as pairwise constraints on URLs allowed to belong to a QL set Pairwise-constrained QL selection is NP-hard. Two-step process:  Heuristically find a large subset of trails that form a tree  Enforce constraints on tree Dynamic program  optimal on tree

27 Experiments Baseline Methods  TopClicked: URL score = # search clicks on URL  TopVisited: URL score = # occurrences on toolbar trails  PageRank: Build a weighted graph on URLs, where weight(i,j) = # trails using the i  j edge URL score = PageRank on this graph

28 Experiments Live Traffic dataset  Computed CTRs on QLs currently displayed by Yahoo! (1043 website subset)  Measure: Pick two equal-sizes subsets of QLs Use sum-of-scores and sum-of-CTRs to predict the better subset Measure how often the predictions match

29 Experiments Live Traffic Data Subset sizes Fraction of subset-pairs where predictions agree with live traffic QL-ALG > TopVisited > PageRank > TopClicked

30 Experiments Tree-structured trails  Most dropped trails are very short  Tree-structured trails improve accuracy 110100100010000 0 20 40 60 80 100 Length of trail Number of trails dropped Live Traffic prediction quality comparison Distribution of dropped trails

32 Conclusions Proposed a formulation for the QL selection problem  Both toolbar and search logs are used intuitively Proposed two algorithms:  Greedy: (1-1/e)-optimal  Tree-structured: empirically better Improvement of 22% over competing baselines

1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti Ravi Kumar Kunal Punera

Similar presentations

Presentation on theme: "1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti Ravi Kumar Kunal Punera"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti Ravi Kumar Kunal Punera

Similar presentations

Presentation on theme: "1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti Ravi Kumar Kunal Punera"— Presentation transcript:

Similar presentations

About project

Feedback