1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti Ravi Kumar Kunal Punera

Slides:



Advertisements
Similar presentations
Google News Personalization: Scalable Online Collaborative Filtering
Advertisements

Diversity Maximization Under Matroid Constraints Date : 2013/11/06 Source : KDD’13 Authors : Zeinab Abbassi, Vahab S. Mirrokni, Mayur Thakur Advisor :
Modelling Relevance and User Behaviour in Sponsored Search using Click-Data Adarsh Prasad, IIT Delhi Advisors: Dinesh Govindaraj SVN Vishwanathan* Group:
Query Chains: Learning to Rank from Implicit Feedback Paper Authors: Filip Radlinski Thorsten Joachims Presented By: Steven Carr.
@ Carnegie Mellon Databases User-Centric Web Crawling Sandeep Pandey & Christopher Olston Carnegie Mellon University.
Trust and Profit Sensitive Ranking for Web Databases and On-line Advertisements Raju Balakrishnan (Arizona State University)
Contextual Advertising by Combining Relevance with Click Feedback D. Chakrabarti D. Agarwal V. Josifovski.
Beyond Keyword Search: Discovering Relevant Scientific Literature Khalid El-Arini and Carlos Guestrin August 22, 2011 TexPoint fonts used in EMF. Read.
Searchable Web sites Recommendation Date : 2012/2/20 Source : WSDM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh Jia-ling 1.
1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.
Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.
Bandits for Taxonomies: A Model-based Approach Sandeep Pandey Deepak Agarwal Deepayan Chakrabarti Vanja Josifovski.
N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.
A Utility-Theoretic Approach to Privacy and Personalization Andreas Krause Carnegie Mellon University work performed during an internship at Microsoft.
Web Crawling Notes by Aisha Walcott
Page-level Template Detection via Isotonic Smoothing Deepayan ChakrabartiYahoo! Research Ravi KumarYahoo! Research Kunal PuneraUniv. of Texas at Austin.
INFERRING NETWORKS OF DIFFUSION AND INFLUENCE Presented by Alicia Frame Paper by Manuel Gomez-Rodriguez, Jure Leskovec, and Andreas Kraus.
Link Analysis, PageRank and Search Engines on the Web
Reducing Human Interactions in Web Directory Searches ORI GERSTEL - Cisco SHAY KUTTEN - Technion EDUARDO SANY LABER - PUC-Rio RACHEL MATICHIN and DAVID.
Web Projections Learning from Contextual Subgraphs of the Web Jure Leskovec, CMU Susan Dumais, MSR Eric Horvitz, MSR.
Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.
1 Traffic Shaping to Optimize Ad Delivery Deepayan Chakrabarti Erik Vee.
“ The Initiative's focus is to dramatically advance the means to collect,store,and organize information in digital forms,and make it available for searching,retrieval,and.
Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.
National Institute of Science & Technology Algorithm to Find Hidden Links Pradyut Kumar Mallick [1] Under the guidance of Mr. Indraneel Mukhopadhyay ALGORITHM.
Mining Rich Session Context to Improve Web Search Guangyu Zhu and Gilad Mishne in Proceedings of 15th ACM SIGKDD International Conference on Knowledge.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
User Browsing Graph: Structure, Evolution and Application Yiqun Liu, Yijiang Jin, Min Zhang, Shaoping Ma, Liyun Ru State Key Lab of Intelligent Technology.
WAES 3308 Numerical Methods for AI
1 A Graph-Theoretic Approach to Webpage Segmentation Deepayan Chakrabarti Ravi Kumar
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
The Business Model and Strategy of MBAA 609 R. Nakatsu.
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
« Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee » Proceedings of the 30th annual international ACM SIGIR, Amsterdam 2007) A.
Querying Structured Text in an XML Database By Xuemei Luo.
The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd Presented by Anca Leuca, Antonis Makropoulos.
Center for E-Business Technology Seoul National University Seoul, Korea BrowseRank: letting the web users vote for page importance Yuting Liu, Bin Gao,
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
Understanding Crowds’ Migration on the Web Yong Wang Komal Pal Aleksandar Kuzmanovic Northwestern University
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.
--He Xiangnan PhD student Importance Estimation of User-generated Data.
Optimal Link Bombs are Uncoordinated Sibel Adali Tina Liu Malik Magdon-Ismail Rensselaer Polytechnic Institute.
Search Engines1 Searching the Web Web is vast. Information is scattered around and changing fast. Anyone can publish on the web. Two issues web users have.
Google’s Deep-Web Crawl By Jayant Madhavan, David Ko, Lucja Kot, Vignesh Ganapathy, Alex Rasmussen, and Alon Halevy August 30, 2008 Speaker : Sahana Chiwane.
Maximizing the Spread of Influence through a Social Network Authors: David Kempe, Jon Kleinberg, É va Tardos KDD 2003.
Presenter: Lung-Hao Lee Nov. 3, Room 310.  Introduction  Related Work  Methods  Results ◦ General Gaze Distribution on SERPs ◦ Effects of Task.
Adish Singla, Microsoft Bing Ryen W. White, Microsoft Research Jeff Huang, University of Washington.
Research © 2008 Yahoo! Generating Succinct Titles for Web URLs Kunal Punera joint work with Deepayan Chakrabarti and Ravi Kumar Yahoo! Research.
Analysing Clickstream Data: From Anomaly Detection to Visitor Profiling Peter I. Hofgesang Wojtek Kowalczyk ECML/PKDD Discovery.
Diversifying Search Results Rakesh Agrawal, Sreenivas Gollapudi, Alan Halverson, Samuel Ieong Search Labs, Microsoft Research WSDM, February 10, 2009 TexPoint.
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
- University of North Texas - DSCI 5240 Fall Graduate Presentation - Option A Slides Modified From 2008 Jones and Bartlett Publishers, Inc. Version.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.
- Murtuza Shareef Authoritative Sources in a Hyperlinked Environment More specifically “Link Analysis” using HITS Algorithm.
Supervised Random Walks: Predicting and Recommending Links in Social Networks Lars Backstrom (Facebook) & Jure Leskovec (Stanford) Proc. of WSDM 2011 Present.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Date: 2013/9/25 Author: Mikhail Ageev, Dmitry Lagun, Eugene Agichtein Source: SIGIR’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Improving Search Result.
Exploring Traversal Strategy for Web Forum Crawling Yida Wang, Jiang-Ming Yang, Wei Lai, Rui Cai Microsoft Research Asia, Beijing SIGIR
1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.
Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
1 1 Stanford University 2 MPI for Biological Cybernetics 3 California Institute of Technology Inferring Networks of Diffusion and Influence Manuel Gomez.
Sofus A. Macskassy Fetch Technologies
Computational Advertising and
Presentation transcript:

1 Quicklink Selection for Navigational Query Results Deepayan Chakrabarti Ravi Kumar Kunal Punera

2 What are quicklinks Quicklinks Result Website

3 Quicklinks = URLs within the search result website Enable fast navigation to important parts of the website Which URLs should be QLs? Quicklinks Result Website

4 Quicklink Selection Some obvious strategies don’t work very well  Top clicked URLs in search engine URL may have low relevance in the QL context  lib.utexas.edu/maps is popular for searches on “maps” and not for searches on “Univ. of Texas” URL may be too specific:  automobiles.honda.com/civic-hybrid/exterior-photos.aspx for honda.com URL popularity be time sensitive:  nytimes.com/election-guide/2008/ for nytimes.com

5 Quicklink Selection Some obvious strategies don’t work very well Top clicked URLs in search engine  Top visited URLs in toolbar data May not relate to search activity: e.g., for nytimes.com  #3 is nytimes.com/mem/ this.htmlnytimes.com/mem/ this.html  #6 is nytimes.com/auth/loginnytimes.com/auth/login  #8 is nytimes.com/gst/regi.htmlnytimes.com/gst/regi.html

6 Quicklink Selection Some obvious strategies don’t work very well Top clicked URLs in search engine Top visited URLs in toolbar data Top URLs from analysis of hyperlink graph  Ignores preferences of search users  Toolbar data is more representative Heavily tagged URLs (e.g., del.icio.us/digg)  Low coverage: Too few websites

7 Quicklink Selection Need a combined approach  Search logs  Toolbar data  Web-server logs  Website hyperlink graph  User tags This paper

8 Related Work Sitemap generation [Perkowitz+/00] Detection of hard-to-find URLs [Srikant+/01] Improving website navigability [Doerr+/07] Mining Web usage patterns [Buchner/99, Cadez+/03] BrowseRank [Liu+/08] Post-search browsing behavior [Bilenko+/08] We focus on QLs in the context of Search

9 Outline Motivation and Related Work Problem Formulation Proposed Solution Experiments Conclusions

10 Problem Formulation Which k URLs should be QLs? “The greatest good for the greatest number” QLs save clicks Maximize the total number of clicks saved using at most k QLs  But when exactly is a click “saved”?

11 Problem Formulation When does a QL get clicked by the user? Graph of click trails (Toolbar data) Say we pick this node as a QL nasa.gov Hubble telescope Photos

12 Problem Formulation Say we pick this node as a QL Assumption: The user recognizes if SearchResult  QL  Destination Graph of click trails (Toolbar data) nasa.gov Hubble telescope Photos

13 Problem Formulation Say we pick this node as a QL (saves 1 click each) Assumption: The user recognizes if SearchResult  QL  Destination Graph of click trails (Toolbar data) nasa.gov

14 Problem Formulation Say we pick this node as a QL (saves 1 click each) (saves 2 clicks each) (saves 0) Total savings = 1*3 + 2*2 = 7 clicks Graph of click trails (Toolbar data) Assumption: The user recognizes if SearchResult  QL  Destination nasa.gov

15 Problem Formulation However…  Unknown pages might become QLs lyrics.com A BCZ … These could become the “best” QLs

16 Problem Formulation However…  Unknown pages might become QLs  Automatic-redirect pages might become QLs: nytimes.com forces logging in aaa.com forces zipcode entry We need QLs that are “noticeable” in a search context

17 Problem Formulation How can we estimate noticeability?  Via Search click-logs  Noticeability of a URL u:  User notices a useful QL with probability α(u) Tuning param (≈ 2) Fraction of search clicks for u on website

18 Problem Formulation QL1 (saves 0) QL2 # trail prob #clicks saves 2 x α 1 x 2 saves 1 x α 1 x 1 saves 2 x (1-α 2 )α 1 x 1 saves 2 x α 2 x 2 Total = 5α 1 + 4α 2 + 2(1-α 1 )α 2 Assumption: The user picks the best QL that he/she notices nasa.gov ?

19 Problem Formulation QL1 (saves 0) QL2 # trail prob #clicks saves 2 x α 1 x 2 saves 1 x α 1 x 1 saves 2 x (1-α 2 )α 1 x 1 saves 2 x α 2 x 2 Total = 5α 1 + 4α 2 + 2(1-α 1 )α 2 If only QL1 is perfectly noticeable (α 1 =1, α 2 =0): Total = 7 clicks (as if 1 QL only) If both QLs are perfectly noticeable (α 1 =1, α 2 =1): Total = 9 clicks nasa.gov

20 Problem Formulation Which k URLs should be QLs? Maximize the expected number of clicks saved using at most k QLs  while incorporating “noticeability”

21 Outline Motivation and Related Work Problem Formulation Proposed Solution Experiments Conclusions

22 Algorithms Maximize expected number of saved clicks using k QLs  NP-Hard Theorem: This objective is non-decreasing submodular 1. Non-negative 2. Adding QLs never hurts 3. “Diminishing Returns” u Marginal improvement to set S Marginal improvement to superset S’

23 Algorithms Greedy algorithm: Iteratively pick QLs that increase the number of saved clicks the most  Within a factor (1-1/e) of OPT [Nemhauser+/’78]

24 Algorithms However…  Inhomogeneous results: QLs for ea.com are fifa08.ea.com battlefield.ea.com 6 webpages deep inside thesim2.ea.com  Redundant results: QLs for senate.gov include obama.senate.gov obama.senate.gov/about obama.senate.gov/contact obama.senate.gov/votes Parent URL makes the child URLs redundant Two games made by EA

25 Algorithms Both can be specified as pairwise constraints on URLs allowed to belong to a QL set Pairwise-constrained QL selection is NP-hard. Two-step process:  Heuristically find a large subset of trails that form a tree  Enforce constraints on tree Dynamic program  optimal on tree

26 Outline Motivation and Related Work Problem Formulation Proposed Solution Experiments Conclusions

27 Experiments Baseline Methods  TopClicked: URL score = # search clicks on URL  TopVisited: URL score = # occurrences on toolbar trails  PageRank: Build a weighted graph on URLs, where weight(i,j) = # trails using the i  j edge URL score = PageRank on this graph

28 Experiments Live Traffic dataset  Computed CTRs on QLs currently displayed by Yahoo! (1043 website subset)  Measure: Pick two equal-sizes subsets of QLs Use sum-of-scores and sum-of-CTRs to predict the better subset Measure how often the predictions match

29 Experiments Live Traffic Data Subset sizes Fraction of subset-pairs where predictions agree with live traffic QL-ALG > TopVisited > PageRank > TopClicked

30 Experiments Tree-structured trails  Most dropped trails are very short  Tree-structured trails improve accuracy Length of trail Number of trails dropped Live Traffic prediction quality comparison Distribution of dropped trails

31 Outline Motivation and Related Work Problem Formulation Proposed Solution Experiments Conclusions

32 Conclusions Proposed a formulation for the QL selection problem  Both toolbar and search logs are used intuitively Proposed two algorithms:  Greedy: (1-1/e)-optimal  Tree-structured: empirically better Improvement of 22% over competing baselines