Adish Singla, Microsoft Bing Ryen W. White, Microsoft Research Jeff Huang, University of Washington.

Slides:

Advertisements

Similar presentations

Beliefs & Biases in Web Search

Advertisements

Predicting User Interests from Contextual Information

Enhancing Personalized Search by Mining and Modeling Task Behavior

Struggling or Exploring? Disambiguating Long Search Sessions

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.

A Machine Learning Approach for Improved BM25 Retrieval

Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.

Beliefs & Biases in Web Search Ryen White Microsoft Research

Studies of the Onset & Persistence of Medical Concerns in Search Logs Ryen White and Eric Horvitz Microsoft Research, Redmond

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.

Mining the Search Trails of Surfing Crowds: Identifying Relevant Websites from User Activity Data Misha Bilenko and Ryen White presented by Matt Richardson.

Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.

Evaluating Search Engine

Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.

Ryen W. White, Microsoft Research Jeff Huang, University of Washington.

Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.

Ryen White, Susan Dumais, Jaime Teevan Microsoft Research {ryenw, sdumais,

Retrieval Evaluation: Precision and Recall. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity.

© Anselm Spoerri Lecture 13 Housekeeping –Term Projects Evaluations –Morse, E., Lewis, M., and Olsen, K. (2002) Testing Visual Information Retrieval Methodologies.

WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.

Web Projections Learning from Contextual Subgraphs of the Web Jure Leskovec, CMU Susan Dumais, MSR Eric Horvitz, MSR.

University of Kansas Department of Electrical Engineering and Computer Science Dr. Susan Gauch April 2005 I T T C Dr. Susan Gauch Personalized Search Based.

Overview of Search Engines

Cohort Modeling for Enhanced Personalized Search Jinyun YanWei ChuRyen White Rutgers University Microsoft BingMicrosoft Research.

Modern Retrieval Evaluations Hongning Wang

From Devices to People: Attribution of Search Activity in Multi-User Settings Ryen White, Ahmed Hassan, Adish Singla, Eric Horvitz Microsoft Research,

Evaluation David Kauchak cs458 Fall 2012 adapted from:

Evaluation David Kauchak cs160 Fall 2009 adapted from:

1 Announcements Research Paper due today Research Talks –Nov. 29 (Monday) Kayatana and Lance –Dec. 1 (Wednesday) Mark and Jeremy –Dec. 3 (Friday) Joe and.

Topics and Transitions: Investigation of User Search Behavior Xuehua Shen, Susan Dumais, Eric Horvitz.

User Browsing Graph: Structure, Evolution and Application Yiqun Liu, Yijiang Jin, Min Zhang, Shaoping Ma, Liyun Ru State Key Lab of Intelligent Technology.

Understanding and Predicting Graded Search Satisfaction Tang Yuk Yu 1.

A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.

Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.

Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.

Karthik Raman, Pannaga Shivaswamy & Thorsten Joachims Cornell University 1.

Probabilistic Query Expansion Using Query Logs Hang Cui Tianjin University, China Ji-Rong Wen Microsoft Research Asia, China Jian-Yun Nie University of.

Ryen W. White, Matthew Richardson, Mikhail Bilenko Microsoft Research Allison Heath Rice University.

Personalizing Search on Shared Devices Ryen White and Ahmed Hassan Awadallah Microsoft Research, USA Contact:

Ryen W. White, Dan Morris Microsoft Research, Redmond, USA {ryenw,

Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.

Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.

Information Retrieval Effectiveness of Folksonomies on the World Wide Web P. Jason Morrison.

1 Web-Page Summarization Using Clickthrough Data* JianTao Sun, Yuchang Lu Dept. of Computer Science TsingHua University Beijing , China Dou Shen,

Personalizing Web Search using Long Term Browsing History Nicolaas Matthijs, Cambridge Filip Radlinski, Microsoft In Proceedings of WSDM

Qi Guo Emory University Ryen White, Susan Dumais, Jue Wang, Blake Anderson Microsoft Presented by Tetsuya Sakai, Microsoft Research.

21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,

Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.

A Repetition Based Measure for Verification of Text Collections and for Text Categorization Dmitry V.Khmelev Department of Mathematics, University of Toronto.

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

Post-Ranking query suggestion by diversifying search Chao Wang.

Measuring the value of search trails in web logs Presentation by Maksym Taran & Scott Breyfogle Research by Ryen White & Jeff Huang.

Modern Retrieval Evaluations Hongning Wang

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Why Decision Engine Bing Demos Search Interaction model Data-driven Research Problems Q & A.

Predicting User Interests from Contextual Information R. W. White, P. Bailey, L. Chen Microsoft (SIGIR 2009) Presenter : Jae-won Lee.

Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.

To Personalize or Not to Personalize: Modeling Queries with Variation in User Intent Presented by Jaime Teevan, Susan T. Dumais, Daniel J. Liebling Microsoft.

Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo.

Bringing Order to the Web : Automatically Categorizing Search Results Advisor ： Dr. Hsu Graduate ： Keng-Wei Chang Author ： Hao Chen Susan Dumais.

Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.

Assessing the Scenic Route: Measuring the Value of Search Trails in Web Logs Ryen W. White1 Jeff Huang2 1Microsoft Research 1University of Washington.

Evaluation of IR Systems

Personalizing Search on Shared Devices

Struggling and Success in Web Search

Ryen White, Ahmed Hassan, Adish Singla, Eric Horvitz

Navigation-Aided Retrieval

Retrieval Performance Evaluation - Measures

Presentation transcript:

Adish Singla, Microsoft Bing Ryen W. White, Microsoft Research Jeff Huang, University of Washington

Search engines usually return lists of documents Documents may be sufficient for known-item tasks Documents may only be starting points for exploration in complex tasks See research on orienteering, berrypicking, etc. IR Focused on Document Retrieval

Beyond Document Retrieval Log data lets us study the search activity of many users Harness wisdom of crowds Search engines already use result clicks extensively Toolbar logs also provide non-search engine activity Trails from these logs might help future users Trails comprise queries and post-query navigation IR systems can return documents and/or trails The “trailfinding” challenge

Trailfinding Trails can provide guidance to users beyond the results Trails can be shown on search result page, e.g., [Screenshot of trails interface] How to select best trail(s) for each query-result pair? We present a log-based method and investigation

Outline for Remainder of Talk Related work Trails Mining Trails Finding Trails Study Methods Metrics Findings Implications

Related Work Trails as evidence for search engine ranking e.g., Agichtein et al., 2006; White & Bilenko, 2008; … Step-by-step guidance for Web navigation e.g., Joachims et al, 1997; Olston & Chi, 2003; Pandit & Olston, 2007 Guided tours (mainly in hypertext community) Tours are first-class objects, found and presented Human-generated e.g., Trigg, 1988; Zellweger, 1989 Automatically-generated e.g., Guinan & Smeaton, 1993; Wheeldon & Levene, 2003

Trail Mining Trails sourced from nine months of MSN toolbar logs Search trails are initiated by search queries Terminate after 10 actions or 30 minutes of inactivity Trails can be represented as Web behavior graphs Graph properties used for trailfinding Result page

Trailfinding Algorithms

Study: Research Qs RQ1: Of the trails and origins, which source: (i) provides more relevant information? (ii) provides more coverage and diversity of the query topic? (iii) provides more useful information? RQ2: Among trailfinding algorithms: (i) how does the value of best-trails chosen differ? (ii) what is the impact of origin relevance on best-trail value and selection? (iii) what are the effects of query characteristics on best-trail value and selection? RQ3: In associating trails to unseen queries: (i) how does the value of trails found through query-term matching compare to trails with exact query matches found in logs? (ii) how robust is term matching for longer queries (which may be noisy)?

Study: Research Qs RQ1: Of the trails and origins, which source: (i) provides more relevant information? (ii) provides more coverage and diversity of the query topic? (iii) provides more useful information? RQ2: Among trailfinding algorithms: (i) how does the value of best-trails chosen differ? (ii) what is the impact of origin relevance on best-trail value and selection? (iii) what are the effects of query characteristics on best-trail value and selection? RQ3: In associating trails to unseen queries: (i) how does the value of trails found through query-term matching compare to trails with exact query matches found in logs? (ii) how robust is term matching for longer queries (which may be noisy)?

Study: Data Preparation Large random sample of queries from Bing logs Queries normalized, etc. Labeled trail pages based on Open Directory Project Classification is automatic, based on URL with back-off Coverage of pages is 65%, partial trail labeling is allowed Interest models were constructed for queries & trails E.g., for query [triathlon training]: Label Norm. Freq. Top/Sports/Multi_Sports/Triathlon/Training 0.58 Top/Sports/Multi_Sports/Triathlon/Events 0.21 Top/Shopping/Sports/Triathlon 0.11

Study: Metrics Coverage Query interest model built from top Goo/Yah/Bing results Fraction of query interest model covered by trail Diversity Fraction of unique query interest model labels in trail Relevance Query-URL relevance scores from human judges (6pt scale) Average relevance score of trail page(s) Utility One if a trail page has dwell time of 30 seconds or more Fox et al. (2005) showed dwell ≥ 30 secs. indicative of utility

Study: Method For each query-result pair: Select the best trail using each trailfinding algorithm Compute each of the metrics Split findings by origin relevance Best – origin results with high relevance ratings Worst – origin results with low relevance ratings Micro-averaged within each query and macro- averaged across all queries Obtain a single value for each source-metric pair

Findings: Coverage/Diversity All differences between algorithms were statistically significant (p <.01) Avg. additional coverage and diversity from trails over result only

Findings: Coverage/Diversity All differences between algorithms were statistically significant (p <.01) Frequent trails are short and may not cover much of query

Findings: Coverage/Diversity All differences between algorithms were statistically significant (p <.01) Relevant trails may only cover one aspect of the query topic

Findings: Avg. Relevance Scores Decreases rather than increases Relevance defined in relation to original query Needs may evolve during trail following Needs may change most during long trails

Findings: Vary Origin Relevance Divided trail data into two buckets: Best origins: trails with highest origin relevance Worst origins: trails with lowest origin relevance Trails help most when initial search results are poor Trails may not be appropriate for all search results

Implications Approach has provided insight into what trailfinding algorithms perform best and when Next step: Compare trail presentation methods Trails can be presented as: Alternative to result lists Popups shown on hover over results In each caption in addition to the snippet and URL Shown on toolbar as user is browsing More work also needed on when to present trails Which queries? Which results? Which query-result pairs?

Summary Presented a study of trailfinding algorithms Compared relevance, coverage, diversity, utility of trails selected by the algorithms Showed: Best-trails outperform average across all trails Differences attributable to algorithm and origin relevance Follow-up user studies and large-scale flights planned See paper for other findings related to effect of query length, trails vs. origins, term-based variants