Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo.

Slides:



Advertisements
Similar presentations
Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management Eytan Adar et al Presented by Xiao Hu CS491CXZ.
Advertisements

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Optimizing search engines using clickthrough data
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
VisualRank: Applying PageRank to Large-Scale Image Search Yushi Jing, Member, IEEE, and Shumeet Baluja, Member, IEEE.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Introduction Information Management systems are designed to retrieve information efficiently. Such systems typically provide an interface in which users.
Recommender Systems Aalap Kohojkar Yang Liu Zhan Shi March 31, 2008.
Evaluating Search Engine
Search Engines and Information Retrieval
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
Presented by Li-Tal Mashiach Learning to Rank: A Machine Learning Approach to Static Ranking Algorithms for Large Data Sets Student Symposium.
Modern Information Retrieval
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
INFO 624 Week 3 Retrieval System Evaluation
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
Web Mining Research: A Survey
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
Sigir’99 Inside Internet Search Engines: Search Jan Pedersen and William Chang.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Online Learning for Web Query Generation: Finding Documents Matching a Minority Concept on the Web Rayid Ghani Accenture Technology Labs, USA Rosie Jones.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Problem Addressed The Navigation –Aided Retrieval tries to provide navigational aided query processing. It claims that the conventional Information Retrieval.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Personalization in Local Search Personalization of Content Ranking in the Context of Local Search Philip O’Brien, Xiao Luo, Tony Abou-Assaleh, Weizheng.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Search Engines and Information Retrieval Chapter 1.
A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA
PageRank for Product Image Search Kevin Jing (Googlc IncGVU, College of Computing, Georgia Institute of Technology) Shumeet Baluja (Google Inc.) WWW 2008.
Scent Trails: Integrating Browsing and Searching on the Web Christopher Olson et al. Blake Adams November 4, 2003.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
1 Can People Collaborate to Improve the relevance of Search Results? Florian Eiteljörge June 11, 2013Florian Eiteljörge.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Information Retrieval Effectiveness of Folksonomies on the World Wide Web P. Jason Morrison.
GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Adish Singla, Microsoft Bing Ryen W. White, Microsoft Research Jeff Huang, University of Washington.
A Word Clustering Approach for Language Model-based Sentence Retrieval in Question Answering Systems Saeedeh Momtazi, Dietrich Klakow University of Saarland,Germany.
Measuring How Good Your Search Engine Is. *. Information System Evaluation l Before 1993 evaluations were done using a few small, well-known corpora of.
INTEGRATING BROWSING AND SEARCHING WebGlimpse and ScentTrails -Rajesh Golla.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Visualization in Text Information Retrieval Ben Houston Exocortex Technologies Zack Jacobson CAC.
Concept-based P2P Search How to find more relevant documents Ingmar Weber Max-Planck-Institute for Computer Science Joint work with Holger Bast Torino,
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.
Augmenting (personal) IR Readings Review Evaluation Papers returned & discussed Papers and Projects checkin time.
A code-centric cluster-based approach for searching online support forums for programmers Christopher Scaffidi, Christopher Chambers, Sheela Surisetty.
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
Information Retrieval Quality of a Search Engine.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Knowledge and Information Retrieval Dr Nicholas Gibbins 32/4037.
Using Blog Properties to Improve Retrieval Gilad Mishne (ICWSM 2007)
Text Based Information Retrieval
Evaluation of IR Systems
John Lafferty, Chengxiang Zhai School of Computer Science
Navigation-Aided Retrieval
Presentation transcript:

Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo

Search & Navigation Trends Users often search and then supplement the search by extensively navigating beyond the search page to locate relevant information. Why ? Query formulation problems Open ended search tasks Preference for orienteering

Search & Navigation Trends User behaviour in IR tasks not often fully exploited by search engines ……….. Content based – words PageRank – in and out links for popularity Collaborative – clicks on results Search engines do not examine these navigation patterns ………(they fail to mention SearchGuide – Coyle et al that does)

NAR – Navigation Aided Recommendation New retrieval paradigm that incorporates post query user navigation as an explicit component – NAR A query is seen as a means to identify starting points for further navigation by users The starting points are presented to the user in a result-list and they permit easy navigation to many documents which match the users query

Existing Context Navigators Synthetic structured navigation-aided retrieval Serves as contextual backdrop for query results and provide semantically meaningful avenues for exploration Do not rely on the presence of good starting points

NAR Navigation retrieval with Organic structure Structure naturally present in pre-existing web documents Advantages Human oversight – human generated categories etc Familiar user Interface – list of documents (i.e. result-list) Single view of document collection Robust implementation – no semantic knowledge required

The model D – set of documents in corpus, T - users search task S T – answer set for search task, Q T - the set of valid queries for task T Query submodel – belief distribution for the answer set given a query. What is the likelihood that doc d solves the task - Relevance Navigation submodel – likelihood that a user starting at a particular document will be able to navigate (under guidance) to a document that solves the task.

Conventional probabilistic IR Model No outward navigation considered Probability of solving the task depends on whether there is a document in the document collection which solves the task Probability of the document solving a task is based on its “relevance” to the query

Navigation-Conscious Model Considers browsing as part of the search task Query submodel – any probabilistic IR relevance ranking model Navigation submodel – Stochastic model of user navigation WUFIS (Chi et al)

WUFIS W(N, d 1, d 2 ) - probability that a user with need N will navigate from d 1 to d 2. Scent provided by anchor and surrounding text. The probability of a link being followed is related to how well a user’s need matches the scent – similarity between weighted vector of need terms and scent terms.

Final Model Documents starting point score = Query submodel X Navigation submodel

Volant - Prototype

Volant - Preprocessing Content Engine R(d,q) –estimated by Okapi DM25 scoring function Connectivity Engine Estimates the probability of a user with need N(d 2 ) navigating from d 1 to d 2 starting with d w Dijikstra’s algorithm used to generate tuples

Volant – Starting points Query entered -> ranked list of starting points 1. Retrieve from the content engine all documents, d’, that are relevant to the query 2. For each document retrieved from 1 retrieve from the connectivity engine all documents d for which W(N(d’),d,d’)>0 3. For each unique d, compute the starting point score. 4. Sort in decreasing order of starting point score

Volant – Navigation Guidance When a user is navigation Volant intercepts the document and highlights links that lead to documents relevant to their query, q. 1. Retrieve from content engine all documents d’ that are relevant to q 2. For each d’ retrieved, get the documents that can lead to d from the connectivity engine i.e. W(N(d’),d,d’)>0 3. For each tuple retrieved in step 2 highlight the links that point to d w

Evaluation Hypothesis 1. In query only scenarios Volant does not perform significantly worse that conventional approaches 2. In combined query/navigation scenarios Volant selects high-quality starting points. 3. In a significant fraction of query navigation scenarios the best organic starting point is of higher quality than the one that can be synthesized using existing techniques.

Search Task Test Sets Navigation prone scenarios are difficult to predict. Simplified Clarity Score was used to determine a set of ambiguous and unambiguous queries Unambiguous – 20 search tasks with highest clarity from Trek 2000 Ambiguous - 48 randomly selected tasks from Trek 2003

Performance on Unambiguous Queries Mean Average Precision No significant difference Why? Relevant documents tended not to be siblings or close cousins so Volant deemed that the best starting points were the documents themselves.

Performance on Ambiguous Queries User study – 48 judges judge the suitability of starting documents as starting points 30 starting points generated 10 Trec winner 2003 CSIRO 10 Volant with user guidance 10 (same as first 10 Volant) Volant without user guidance

Performance on Ambiguous Queries Rating criteria Breadth – spectrum of people, different interests Accessibility – how easy to navigate and find info Appeal – presentation of material Usefulness – would people be able to complete their task from this point. Each judge spent 5 hours on their task

Results

Summary & Future Work Effectiveness – responds to users and positions them at suitable starting point for their task, guides them to further information in a query driven fashion. Relationship to conventional IR – generalizes conventional probabilistic IR model and is successful in scenarios where IR techniques fail – ambiguous queries etc

Discussion Cold Start Problem Scalability Bias in Evaluation