1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.

Slides:



Advertisements
Similar presentations
Beliefs & Biases in Web Search
Advertisements

Accurately Interpreting Clickthrough Data as Implicit Feedback Joachims, Granka, Pan, Hembrooke, Gay Paper Presentation: Vinay Goel 10/27/05.
Evaluating Novelty and Diversity Charles Clarke School of Computer Science University of Waterloo two talks in one!
Struggling or Exploring? Disambiguating Long Search Sessions
Temporal Query Log Profiling to Improve Web Search Ranking Alexander Kotov (UIUC) Pranam Kolari, Yi Chang (Yahoo!) Lei Duan (Microsoft)
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Modelling Relevance and User Behaviour in Sponsored Search using Click-Data Adarsh Prasad, IIT Delhi Advisors: Dinesh Govindaraj SVN Vishwanathan* Group:
Optimizing search engines using clickthrough data
Query Chains: Learning to Rank from Implicit Feedback Paper Authors: Filip Radlinski Thorsten Joachims Presented By: Steven Carr.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
WSCD INTRODUCTION  Query suggestion has often been described as the process of making a user query resemble more closely the documents it is expected.
Evaluating Search Engine
Click Evidence Signals and Tasks Vishwa Vinay Microsoft Research, Cambridge.
Search Engines and Information Retrieval
Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan Susan T.Dumains Eric Horvitz MIT,CSAILMicrosoft Researcher Microsoft.
Time-dependent Similarity Measure of Queries Using Historical Click- through Data Qiankun Zhao*, Steven C. H. Hoi*, Tie-Yan Liu, et al. Presented by: Tie-Yan.
Presented by Li-Tal Mashiach Learning to Rank: A Machine Learning Approach to Static Ranking Algorithms for Large Data Sets Student Symposium.
Ryen W. White, Microsoft Research Jeff Huang, University of Washington.
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
Sigir’99 Inside Internet Search Engines: Search Jan Pedersen and William Chang.
Recommender systems Ram Akella November 26 th 2008.
Collaborative Recommendation via Adaptive Association Rule Mining KDD-2000 Workshop on Web Mining for E-Commerce (WebKDD-2000) Weiyang Lin Sergio A. Alvarez.
Adapting Deep RankNet for Personalized Search
Search Engines and Information Retrieval Chapter 1.
1 Context-Aware Search Personalization with Concept Preference CIKM’11 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Automatically Identifying Localizable Queries Center for E-Business Technology Seoul National University Seoul, Korea Nam, Kwang-hyun Intelligent Database.
Understanding and Predicting Graded Search Satisfaction Tang Yuk Yu 1.
Modeling User Interactions in Web Search and Social Media Eugene Agichtein Intelligent Information Access Lab Emory University.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
Fan Guo 1, Chao Liu 2 and Yi-Min Wang 2 1 Carnegie Mellon University 2 Microsoft Research Feb 11, 2009.
1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Hao Wu Nov Outline Introduction Related Work Experiment Methods Results Conclusions & Next Steps.
Modeling User Interactions in Web Search and Social Media Eugene Agichtein Intelligent Information Access Lab Emory University.
Implicit Acquisition of Context for Personalization of Information Retrieval Systems Chang Liu, Nicholas J. Belkin School of Communication and Information.
윤언근 DataMining lab.  The Web has grown exponentially in size but this growth has not been isolated to good-quality pages.  spamming and.
Presenter: Lung-Hao Lee ( 李龍豪 ) January 7, 309.
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Question Answering over Implicitly Structured Web Content
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.
Analysis of Topic Dynamics in Web Search Xuehua Shen (University of Illinois) Susan Dumais (Microsoft Research) Eric Horvitz (Microsoft Research) WWW 2005.
BEHAVIORAL TARGETING IN ON-LINE ADVERTISING: AN EMPIRICAL STUDY AUTHORS: JOANNA JAWORSKA MARCIN SYDOW IN DEFENSE: XILING SUN & ARINDAM PAUL.
Qi Guo Emory University Ryen White, Susan Dumais, Jue Wang, Blake Anderson Microsoft Presented by Tetsuya Sakai, Microsoft Research.
Finding high-Quality contents in Social media BY : APARNA TODWAL GUIDED BY : PROF. M. WANJARI.
Adish Singla, Microsoft Bing Ryen W. White, Microsoft Research Jeff Huang, University of Washington.
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
Post-Ranking query suggestion by diversifying search Chao Wang.
ASSOCIATIVE BROWSING Evaluating 1 Jinyoung Kim / W. Bruce Croft / David Smith for Personal Information.
NTU & MSRA Ming-Feng Tsai
11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Why Decision Engine Bing Demos Search Interaction model Data-driven Research Problems Q & A.
Predicting User Interests from Contextual Information R. W. White, P. Bailey, L. Chen Microsoft (SIGIR 2009) Presenter : Jae-won Lee.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008 Annotations by Michael L. Nelson.
1 Personalizing Search via Automated Analysis of Interests and Activities Jaime Teevan, MIT Susan T. Dumais, Microsoft Eric Horvitz, Microsoft SIGIR 2005.
SEARCH AND CONTEXT Susan Dumais, Microsoft Research INFO 320.
Accurately Interpreting Clickthrough Data as Implicit Feedback
Search User Behavior: Expanding The Web Search Frontier
Evaluation of IR Systems
Eugene Agichtein Mathematics & Computer Science Emory University
Evidence from Behavior
Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,
Presentation transcript:

1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research

2 User Interactions Goal: Harness rich user interactions with search results to improve quality of search Goal: Harness rich user interactions with search results to improve quality of search Millions of users submit queries daily and interact with the search results Millions of users submit queries daily and interact with the search results –Clicks, query refinement, dwell time User interactions with search engines are plentiful, but require careful interpretation User interactions with search engines are plentiful, but require careful interpretation We will predict user preferences for results We will predict user preferences for results

3 Related Work Linking implicit interactions and explicit judgments Linking implicit interactions and explicit judgments –Fox et al. [TOIS 2005] Predict explicit satisfaction rating Predict explicit satisfaction rating –Joachims [SIGIR 2005 ] Predict preference (gaze studies, interpretation strategies) Predict preference (gaze studies, interpretation strategies) More broad overview of analyzing implicit interactions: Kelly & Teevan [SIGIR Forum 2003] More broad overview of analyzing implicit interactions: Kelly & Teevan [SIGIR Forum 2003]

4 Outline Distributional model of user interactions Distributional model of user interactions –User Behavior = Relevance + “Noise” Rich set of user interaction features Rich set of user interaction features Learning framework to predict user preferences Learning framework to predict user preferences Large-scale evaluation Large-scale evaluation

5 Interpreting User Interactions Clickthrough and subsequent browsing behavior of individual users influenced by many factors Clickthrough and subsequent browsing behavior of individual users influenced by many factors –Relevance of a result to a query –Visual appearance and layout –Result presentation order –Context, history, etc. General idea: General idea: –Aggregate interactions across all users and queries –Compute “expected” behavior for any query/page –Recover relevance signal for a given query

6 Case Study: Clickthrough Clickthrough frequency for all queries in sample Clickthrough frequency for all queries in sample Clickthrough (query q, document d, result position p) = expected (p) + relevance (q, d)

7 Clickthrough for Queries with Known Position of Top Relevant Result Relative clickthrough for queries top relevant result known to be at position 1

8 Clickthrough for Queries with Known Position of Top Relevant Result Relative clickthrough for queries with known relevant results in position 1 and 3 respectively Higher clickthrough at top non-relevant than at top relevant document

9 Deviation from Expected Relevance component: deviation from “expected”: Relevance component: deviation from “expected”: Relevance(q, d)= observed - expected (p)

10 Beyond Clickthrough: Rich User Interaction Space Observed and Distributional features Observed and Distributional features –Observed features: aggregated values over all user interactions for each query and result pair –Distributional features: deviations from the “expected” behavior for the query Represent user interactions as vectors in “Behavior Space” Represent user interactions as vectors in “Behavior Space” –Presentation: what a user sees before click –Clickthrough: frequency and timing of clicks –Browsing: what users do after the click

11 Some User Interaction Features Presentation ResultPosition Position of the URL in Current ranking QueryTitleOverlap Fraction of query terms in result Title Clickthrough DeliberationTime Seconds between query and first click ClickFrequency Fraction of all clicks landing on page ClickDeviation Deviation from expected click frequency Browsing DwellTime Result page dwell time DwellTimeDeviation Deviation from expected dwell time for query

12 Outline Distributional model of user interactions Distributional model of user interactions Rich set of user interaction features Rich set of user interaction features Models for predicting user preferences Models for predicting user preferences Experimental results Experimental results

13 Predicting Result Preferences Task: predict pairwise preferences Task: predict pairwise preferences –A user will prefer Result A > Result B Models for preference prediction Models for preference prediction –Current search engine ranking –Clickthrough –Full user behavior model

14 Clickthrough Model SA+N: “Skip Above” and “Skip Next” SA+N: “Skip Above” and “Skip Next” –Adapted from Joachims’ et al. [SIGIR’05] –Motivated by gaze tracking Example Example –Click on results 2, 4 –Skip Above: 4 > (1, 3), 2>1 –Skip Next: 4 > 5, 2>

15 Distributional Model CD: distributional model, extends SA+N CD: distributional model, extends SA+N –Clickthrough considered iff frequency > ε than expected Click on result 2 likely “by chance” Click on result 2 likely “by chance” 4>(1,2,3,5), but not 2>(1,3) 4>(1,2,3,5), but not 2>(1,3)

16 User Behavior Model Full set of interaction features Full set of interaction features –Presentation, clickthrough, browsing Train the model with explicit judgments Train the model with explicit judgments –Input: behavior feature vectors for each query-page pair in rated results –Use RankNet (Burges et al., [ICML 2005]) to discover model weights –Output: a neural net that can assign a “relevance” score to a behavior feature vector

17 RankNet for User Behavior RankNet: general, scalable, robust Neural Net training algorithms and implementation RankNet: general, scalable, robust Neural Net training algorithms and implementation Optimized for ranking – predicting an ordering of items, not scores for each Optimized for ranking – predicting an ordering of items, not scores for each Trains on pairs (where first point is to be ranked higher or equal to second) Trains on pairs (where first point is to be ranked higher or equal to second) –Extremely efficient –Uses cross entropy cost (probabilistic model) –Uses gradient descent to set weights –Restarts to escape local minima

18 Outline Distributional model of user interactions Distributional model of user interactions Rich set of user interaction features Rich set of user interaction features Models for predicting user preferences Models for predicting user preferences Experimental evaluation Experimental evaluation

19 Evaluation Metrics Task: predict user preferences Task: predict user preferences Pairwise agreement: Pairwise agreement: –For comparison with previous work –Useful for ranking and other applications Precision for a query: Precision for a query: –Fraction of pairs predicted that agree with preferences derived from human ratings Recall for a query: Recall for a query: –Fraction of human-rated preferences predicted correctly Average Precision and Recall across all queries Average Precision and Recall across all queries

20 Datasets Explicit judgments Explicit judgments –3,500 queries, top 10 results, relevance ratings converted to pairwise preferences for each query User behavior data User behavior data –Opt-in client-side instrumentation –Anonymized UserID, time, visited page Detect queries submitted to MSN Search engine Detect queries submitted to MSN Search engine Subsequent visited pages Subsequent visited pages 120,000 instances of these 3,500 queries submitted at least 2 times over 21 days 120,000 instances of these 3,500 queries submitted at least 2 times over 21 days

21 Methods Compared Preferences inferred by: Current search engine ranking: Baseline Current search engine ranking: Baseline –Result i > Result j iff i > j Clickthrough model: SA+N Clickthrough model: SA+N Clickthrough distributional model: CD Clickthrough distributional model: CD Full user behavior model: UserBehavior Full user behavior model: UserBehavior

22 Results: Predicting User Preferences Baseline < SA+N < CD << UserBehavior Rich user behavior features result in dramatic improvement

23 Contribution of Feature Types Presentation features not helpful Browsing features: higher precision, lower recall Clickthrough features > CD: due to learning

24 Amount of Interaction Data Prediction accuracy for varying amount of user interactions per query Slight increase in Recall, substantial increase in Precision

25 Learning Curve Minimum precision of 0.7 Recall increases substantially with more days of user interactions

26 Experiments Summary Clickthrough distributional model: more accurate than previously published work Clickthrough distributional model: more accurate than previously published work Rich user behavior features: dramatic accuracy improvement Rich user behavior features: dramatic accuracy improvement Accuracy increases for frequent queries and longer observation period Accuracy increases for frequent queries and longer observation period

27 Some Applications Web search ranking (next talk): Web search ranking (next talk): –Can use preference predictions to re-rank results –Can integrate features into ranking algorithms Identifying and answering navigational queries Identifying and answering navigational queries –Can tune model to focus on top 1 result –Supports classification or ranking methods –Details in Agichtein & Zheng, [KDD 2006] Automatic evaluation: augment explicit relevance judgments Automatic evaluation: augment explicit relevance judgments

28 Conclusions General framework for training rich user interaction models General framework for training rich user interaction models Robust techniques for inferring user relevance preferences Robust techniques for inferring user relevance preferences High-accuracy preference prediction in a large scale evaluation High-accuracy preference prediction in a large scale evaluation

29 Thank you Text Mining, Search, and Navigation group: Adaptive Systems and Interaction group: Microsoft Research

30 Presentation Features Query terms in Title, Summary, URL Query terms in Title, Summary, URL Position of result Position of result Length of URL Length of URL Depth of URL Depth of URL …

31 Clickthrough Features Fraction of clicks on URL Fraction of clicks on URL Deviation from “expected” given result position Deviation from “expected” given result position Time to click Time to click Time to first click in “session” Time to first click in “session” Deviation from average time for query Deviation from average time for query

32 Browsing Features Time on URL Time on URL Cumulative time on URL (CuriousBrowser) Cumulative time on URL (CuriousBrowser) Deviation from average time on URL Deviation from average time on URL –Averaged over the “user” –Averaged over all results for the query Number of subsequent non-result URLs Number of subsequent non-result URLs

33 An Intelligent Baseline