Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti Ravi Kumar Kunal Punera

Similar presentations


Presentation on theme: "1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti Ravi Kumar Kunal Punera"— Presentation transcript:

1 1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti (deepay@yahoo-inc.com) Ravi Kumar (ravikuma@yahoo-inc.com) Kunal Punera (kpunera@yahoo-inc.com)

2 2 What are quicklinks Quicklinks Result Website

3 3 Quicklinks = URLs within the search result website Enable fast navigation to important parts of the website Which URLs should be QLs? Quicklinks Result Website

4 4 Quicklink Selection Some obvious strategies don’t work very well  Top clicked URLs in search engine URL may have low relevance in the QL context  lib.utexas.edu/maps is popular for searches on “maps” and not for searches on “Univ. of Texas” URL may be too specific:  automobiles.honda.com/civic-hybrid/exterior-photos.aspx for honda.com URL popularity be time sensitive:  nytimes.com/election-guide/2008/ for nytimes.com

5 5 Quicklink Selection Some obvious strategies don’t work very well Top clicked URLs in search engine  Top visited URLs in toolbar data May not relate to search activity: e.g., for nytimes.com  #3 is nytimes.com/mem/emailthis.htmlnytimes.com/mem/emailthis.html  #6 is nytimes.com/auth/loginnytimes.com/auth/login  #8 is nytimes.com/gst/regi.htmlnytimes.com/gst/regi.html

6 6 Quicklink Selection Some obvious strategies don’t work very well Top clicked URLs in search engine Top visited URLs in toolbar data Top URLs from analysis of hyperlink graph  Ignores preferences of search users  Toolbar data is more representative Heavily tagged URLs (e.g., del.icio.us/digg)  Low coverage: Too few websites

7 7 Quicklink Selection Need a combined approach  Search logs  Toolbar data  Web-server logs  Website hyperlink graph  User tags This paper

8 8 Related Work Sitemap generation [Perkowitz+/00] Detection of hard-to-find URLs [Srikant+/01] Improving website navigability [Doerr+/07] Mining Web usage patterns [Buchner/99, Cadez+/03] BrowseRank [Liu+/08] Post-search browsing behavior [Bilenko+/08] We focus on QLs in the context of Search

9 9 Outline Motivation and Related Work Problem Formulation Proposed Solution Experiments Conclusions

10 10 Problem Formulation Which k URLs should be QLs? “The greatest good for the greatest number” QLs save clicks Maximize the total number of clicks saved using at most k QLs  But when exactly is a click “saved”?

11 11 Problem Formulation When does a QL get clicked by the user? Graph of click trails (Toolbar data) Say we pick this node as a QL nasa.gov Hubble telescope Photos

12 12 Problem Formulation Say we pick this node as a QL Assumption: The user recognizes if SearchResult  QL  Destination Graph of click trails (Toolbar data) nasa.gov Hubble telescope Photos

13 13 Problem Formulation Say we pick this node as a QL (saves 1 click each) Assumption: The user recognizes if SearchResult  QL  Destination Graph of click trails (Toolbar data) nasa.gov

14 14 Problem Formulation Say we pick this node as a QL (saves 1 click each) (saves 2 clicks each) (saves 0) Total savings = 1*3 + 2*2 = 7 clicks Graph of click trails (Toolbar data) Assumption: The user recognizes if SearchResult  QL  Destination nasa.gov

15 15 Problem Formulation However…  Unknown pages might become QLs lyrics.com A BCZ … These could become the “best” QLs

16 16 Problem Formulation However…  Unknown pages might become QLs  Automatic-redirect pages might become QLs: nytimes.com forces logging in aaa.com forces zipcode entry We need QLs that are “noticeable” in a search context

17 17 Problem Formulation How can we estimate noticeability?  Via Search click-logs  Noticeability of a URL u:  User notices a useful QL with probability α(u) Tuning param (≈ 2) Fraction of search clicks for u on website

18 18 Problem Formulation QL1 (saves 0) QL2 # trail prob #clicks saves 2 x α 1 x 2 saves 1 x α 1 x 1 saves 2 x (1-α 2 )α 1 x 1 saves 2 x α 2 x 2 Total = 5α 1 + 4α 2 + 2(1-α 1 )α 2 Assumption: The user picks the best QL that he/she notices nasa.gov ?

19 19 Problem Formulation QL1 (saves 0) QL2 # trail prob #clicks saves 2 x α 1 x 2 saves 1 x α 1 x 1 saves 2 x (1-α 2 )α 1 x 1 saves 2 x α 2 x 2 Total = 5α 1 + 4α 2 + 2(1-α 1 )α 2 If only QL1 is perfectly noticeable (α 1 =1, α 2 =0): Total = 7 clicks (as if 1 QL only) If both QLs are perfectly noticeable (α 1 =1, α 2 =1): Total = 9 clicks nasa.gov

20 20 Problem Formulation Which k URLs should be QLs? Maximize the expected number of clicks saved using at most k QLs  while incorporating “noticeability”

21 21 Outline Motivation and Related Work Problem Formulation Proposed Solution Experiments Conclusions

22 22 Algorithms Maximize expected number of saved clicks using k QLs  NP-Hard Theorem: This objective is non-decreasing submodular 1. Non-negative 2. Adding QLs never hurts 3. “Diminishing Returns” u Marginal improvement to set S Marginal improvement to superset S’

23 23 Algorithms Greedy algorithm: Iteratively pick QLs that increase the number of saved clicks the most  Within a factor (1-1/e) of OPT [Nemhauser+/’78]

24 24 Algorithms However…  Inhomogeneous results: QLs for ea.com are fifa08.ea.com battlefield.ea.com 6 webpages deep inside thesim2.ea.com  Redundant results: QLs for senate.gov include obama.senate.gov obama.senate.gov/about obama.senate.gov/contact obama.senate.gov/votes Parent URL makes the child URLs redundant Two games made by EA

25 25 Algorithms Both can be specified as pairwise constraints on URLs allowed to belong to a QL set Pairwise-constrained QL selection is NP-hard. Two-step process:  Heuristically find a large subset of trails that form a tree  Enforce constraints on tree Dynamic program  optimal on tree

26 26 Outline Motivation and Related Work Problem Formulation Proposed Solution Experiments Conclusions

27 27 Experiments Baseline Methods  TopClicked: URL score = # search clicks on URL  TopVisited: URL score = # occurrences on toolbar trails  PageRank: Build a weighted graph on URLs, where weight(i,j) = # trails using the i  j edge URL score = PageRank on this graph

28 28 Experiments Live Traffic dataset  Computed CTRs on QLs currently displayed by Yahoo! (1043 website subset)  Measure: Pick two equal-sizes subsets of QLs Use sum-of-scores and sum-of-CTRs to predict the better subset Measure how often the predictions match

29 29 Experiments Live Traffic Data Subset sizes Fraction of subset-pairs where predictions agree with live traffic QL-ALG > TopVisited > PageRank > TopClicked

30 30 Experiments Tree-structured trails  Most dropped trails are very short  Tree-structured trails improve accuracy 110100100010000 0 20 40 60 80 100 Length of trail Number of trails dropped Live Traffic prediction quality comparison Distribution of dropped trails

31 31 Outline Motivation and Related Work Problem Formulation Proposed Solution Experiments Conclusions

32 32 Conclusions Proposed a formulation for the QL selection problem  Both toolbar and search logs are used intuitively Proposed two algorithms:  Greedy: (1-1/e)-optimal  Tree-structured: empirically better Improvement of 22% over competing baselines


Download ppt "1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti Ravi Kumar Kunal Punera"

Similar presentations


Ads by Google